Skip to content
Snippets Groups Projects
Commit f2960058 authored by Baptiste Bauvin's avatar Baptiste Bauvin
Browse files

Doc and corrections

parent 8e48ae02
No related branches found
No related tags found
No related merge requests found
Pipeline #4420 passed
Showing
with 370 additions and 107 deletions
...@@ -14,4 +14,8 @@ build* ...@@ -14,4 +14,8 @@ build*
dist* dist*
multiview_platform/.idea/* multiview_platform/.idea/*
.gitignore .gitignore
multiview_platform/examples/results* multiview_platform/examples/results/example_1/*
multiview_platform/examples/results/example_2/*
multiview_platform/examples/results/example_3/*
multiview_platform/examples/results/example_4/*
multiview_platform/examples/results/example_5/*
...@@ -79,17 +79,17 @@ In order to start a benchmark on your own dataset, you need to format it so SuMM ...@@ -79,17 +79,17 @@ In order to start a benchmark on your own dataset, you need to format it so SuMM
[comment]: <> (With `top_directory` being the last directory in the `pathF` argument) [comment]: <> (With `top_directory` being the last directory in the `pathF` argument)
##### If you already have an HDF5 dataset file it must be formatted as : ##### If you already have an HDF5 dataset file it must be formatted as :
One dataset for each view called `ViewI` with `I` being the view index with 2 attribures : * One dataset for each view called `ViewI` with `I` being the view index with 2 attribures :
* `attrs["name"]` a string for the name of the view * `attrs["name"]` a string for the name of the view
* `attrs["sparse"]` a boolean specifying whether the view is sparse or not * `attrs["sparse"]` a boolean specifying whether the view is sparse or not (WIP)
* `attrs["ranges"]` a `np.array` containing the ranges of each attribute in the view (for ex. : for a pixel the range will be 255, for a real attribute in [-1,1], the range will be 2).
* `attrs["limits"]` a `np.array` containing all the limits of the attributes int he view. (for ex. : for a pixel the limits will be `[0, 255]`, for a real attribute in [-1,1], the limits will be `[-1,1]`).
One dataset for the labels called `Labels` with one attribute : * One dataset for the labels called `Labels` with one attribute :
* `attrs["names"]` a list of strings encoded in utf-8 namig the labels in the right order * `attrs["names"]` a list of strings encoded in utf-8 naming the labels in the right order
One group for the additional data called `Metadata` containing at least 3 attributes : * One group for the additional data called `Metadata` containing at least 1 dataset :
* `"example_ids"`, a numpy array of type `S100`, with the ids of the examples in the right order
* And three attributes :
* `attrs["nbView"]` an int counting the total number of views in the dataset * `attrs["nbView"]` an int counting the total number of views in the dataset
* `attrs["nbClass"]` an int counting the total number of different labels in the dataset * `attrs["nbClass"]` an int counting the total number of different labels in the dataset
* `attrs["datasetLength"]` an int counting the total number of examples in the dataset * `attrs["datasetLength"]` an int counting the total number of examples in the dataset
...@@ -115,4 +115,5 @@ It is highly recommended to follow the documentation's [tutorials](http://baptis ...@@ -115,4 +115,5 @@ It is highly recommended to follow the documentation's [tutorials](http://baptis
### Contributors ### Contributors
* **Dominique Benielli** * **Dominique BENIELLI**
* **Alexis PROD'HOMME**
\ No newline at end of file
# The base configuration of the benchmark # The base configuration of the benchmark
log: True log: True
name: ["digits",] name: ["doc_summit",]
label: "_" label: "_"
file_type: ".hdf5" file_type: ".hdf5"
views: views:
pathf: "/home/baptiste/Documents/Datasets/Digits/" pathf: "/home/baptiste/Documents/Gitwork/multiview_generator/demo/"
nice: 0 nice: 0
random_state: 42 random_state: 42
nb_cores: 1 nb_cores: 1
full: True full: False
debug: True debug: True
add_noise: False add_noise: False
noise_std: 0.0 noise_std: 0.0
...@@ -17,10 +17,10 @@ track_tracebacks: False ...@@ -17,10 +17,10 @@ track_tracebacks: False
# All the classification-realted configuration options # All the classification-realted configuration options
multiclass_method: "oneVersusOne" multiclass_method: "oneVersusOne"
split: 0.25 split: 0.5
nb_folds: 2 nb_folds: 2
nb_class: nb_class: 2
classes: classes: ["label_1", "label_6"]
type: ["monoview", "multiview"] type: ["monoview", "multiview"]
algos_monoview: ["decision_tree" ] algos_monoview: ["decision_tree" ]
algos_multiview: ["weighted_linear_early_fusion","weighted_linear_late_fusion"] algos_multiview: ["weighted_linear_early_fusion","weighted_linear_late_fusion"]
......
MultiviewPlatform documentation master file, created by Welcome to Supervised MultiModal Integration Tool's documentation !
sphinx-quickstart on Mon Jan 29 17:13:09 2018. ===================================================================
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to SuMMIT's documentation! This package has been designed as an easy-to-use platform to estimate different mono- and multi-view classifiers' performances on a multiview dataset.
=============================================
This package ha been designed as an easy-to-use platform to estimate different mono- and multi-view classifiers' performances on a multiview dataset.
The main advantage of the platform is that it allows to add and remove a classifier without modifying its core code (the procedure is described thoroughly in this documentation). The main advantage of the platform is that it allows to add and remove a classifier without modifying its core code (the procedure is described thoroughly in this documentation).
This documentation consists in a short read me, with instructions to install and get started with SuMMIT, then several use cases to discover the features, and all the documented sources.
.. note::
The documentation, the platform and the tests are constantly being updated.
All the content labelled WIP is Work In Progress
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Contents:
readme_link readme_link
tutorials/index tutorials/index
......
.. role:: python(code)
:language: python
.. role :: yaml(code)
:language: yaml
=============================================== ===============================================
Example 1 : First steps with Multiview Platform Example 1 : First steps with SuMMIT
=============================================== ===============================================
Context Context
...@@ -14,15 +19,52 @@ Adding a new classifier (monoview and/or multiview) to the benchmark as been mad ...@@ -14,15 +19,52 @@ Adding a new classifier (monoview and/or multiview) to the benchmark as been mad
customize the set of classifiers and test their performances in a controlled environment. customize the set of classifiers and test their performances in a controlled environment.
Introduction to this tutorial Introduction to this tutorial
----------------------------- -----------------------------
This tutorial will show you how to use the platform on simulated data, for the simplest problem : biclass classification. This tutorial will show you how to use the platform on simulated data, for the simplest problem : vanilla multiclass classification.
The data is naively generated with a soon-to-be published multiview generator that allows to control redundancy, mutual error and complementarity among the views.
For all the tutorials, we will use the same dataset.
A generated dataset to rule them all
------------------------------------
The data is naively generated TODO : Keep the same generator ? The dataset that will be used in the examples consists in
+ 500 examples that are either
+ mis-described by all the views (labelled ``Mutual_error_*``),
+ well-described by all the views (labelled ``Redundant_*``),
+ well-described by the majority of the views (labelled ``Complementary_*``),
+ randomly well- or mis-described by the views (labelled ``example_*``).
+ 8 balanced classes named ``'label_1'``, ..., ``'label_8'``,
+ 4 views named ``'generated_view_1'``, ..., ``'generated_view_4'``,
+ each view consisting in 3 features.
It has been parametrized with the following error matrix :
+---------+--------+--------+--------+--------+
| | View 1 | View 2 | View 3 | View 4 |
+=========+========+========+========+========+
| label_1 | 0.40 | 0.40 | 0.40 | 0.40 |
+---------+--------+--------+--------+--------+
| label_2 | 0.55 | 0.40 | 0.40 | 0.40 |
+---------+--------+--------+--------+--------+
| label_3 | 0.40 | 0.50 | 0.60 | 0.55 |
+---------+--------+--------+--------+--------+
| label_4 | 0.40 | 0.50 | 0.50 | 0.40 |
+---------+--------+--------+--------+--------+
| label_5 | 0.40 | 0.40 | 0.40 | 0.40 |
+---------+--------+--------+--------+--------+
| label_6 | 0.40 | 0.40 | 0.40 | 0.40 |
+---------+--------+--------+--------+--------+
| label_7 | 0.40 | 0.40 | 0.40 | 0.40 |
+---------+--------+--------+--------+--------+
So this means that the view 1 should make at least 40% error on label 1 and 65% on label 2.
Getting started Getting started
--------------- ---------------
...@@ -41,25 +83,26 @@ We will decrypt the main arguments : ...@@ -41,25 +83,26 @@ We will decrypt the main arguments :
+ The first part regroups the basics : + The first part regroups the basics :
- ``log: True`` allows to print the log in the terminal, - :yaml:`log: True` allows to print the log in the terminal,
- ``name: ["plausible"]`` uses the plausible simulated dataset, - :yaml:`name: ["summit_doc"]` uses the plausible simulated dataset,
- ``random_state: 42`` fixes the seed of the random state for this benchmark, it is useful for reproductibility, - :yaml:`random_state: 42` fixes the seed of the random state for this benchmark, it is useful for reproductibility,
- ``full: True`` means the benchmark will use the full dataset, - :yaml:`full: True` means the benchmark will use the full dataset,
- ``res_dir: "examples/results/example_1/"`` saves the results in ``multiview-machine-learning-omis/multiview_platform/examples/results/example_1`` - :yaml:`res_dir: "examples/results/example_1/"` saves the results in ``multiview-machine-learning-omis/multiview_platform/examples/results/example_1``
+ Then the classification-related arguments : + Then the classification-related arguments :
- ``split: 0.8`` means that 80% of the dataset will be used to test the different classifiers and 20% to train them, - :yaml:`split: 0.25` means that 80% of the dataset will be used to test the different classifiers and 20% to train them,
- ``type: ["monoview", "multiview"]`` allows for monoview and multiview algorithms to be used in the benchmark, - :yaml:`type: ["monoview", "multiview"]` allows for monoview and multiview algorithms to be used in the benchmark,
- ``algos_monoview: ["all"]`` runs on all the available monoview algorithms (same for ``algos_muliview``) - :yaml:`algos_monoview: ["decision_tree"]` runs a Decision tree on each view,
- :yaml:`algos_monoview: ["weighted_linear_early_fusion", "weighted_linear_late_fusion"]` runs a late and an early fusion,
- The metrics configuration :: - The metrics configuration ::
metrics: metrics:
accuracy_score:{} accuracy_score:{}
f1_score: f1_score:
average:"binary" average:"micro"
means that the benchmark will evaluate the performance of each algorithms on accuracy, and f1-score with a binary average. means that the benchmark will evaluate the performance of each algorithms on accuracy, and f1-score with a micro average (because of the multi-class setting).
**Start the benchmark** **Start the benchmark**
...@@ -76,9 +119,8 @@ The execution should take less than five minutes. We will first analyze the resu ...@@ -76,9 +119,8 @@ The execution should take less than five minutes. We will first analyze the resu
The result structure can be startling at first, but, as the platform provides a lot of information, it has to be organized. The result structure can be startling at first, but, as the platform provides a lot of information, it has to be organized.
The results are stored in ``multiview_platform/examples/results/example_1/``. Here, you will find a directory with the name of the database used for the benchmark, here : ``plausible/`` The results are stored in ``multiview_platform/examples/results/example_1/``. Here, you will find a directory with the name of the database used for the benchmark, here : ``summit_doc/``
Then, a directory with the amount of noise in the experiments, we didn't add any, so ``n_0/`` finally, a directory with Finally, a directory with the date and time of the beginning of the experiment. Let's say you started the benchmark on the 25th of December 1560,
the date and time of the beginning of the experiment. Let's say you started the benchmark on the 25th of December 1560,
at 03:42 PM, the directory's name should be ``started_1560_12_25-15_42/``. at 03:42 PM, the directory's name should be ``started_1560_12_25-15_42/``.
From here the result directory has the structure that follows : From here the result directory has the structure that follows :
...@@ -86,23 +128,13 @@ From here the result directory has the structure that follows : ...@@ -86,23 +128,13 @@ From here the result directory has the structure that follows :
.. code-block:: bash .. code-block:: bash
| started_1560_12_25-15_42 | started_1560_12_25-15_42
| ├── adaboost | ├── decision_tree
| | ├── ViewNumber0 | | ├── generated_view_1
| | | ├── *-summary.txt
| | | ├── <other classifier dependant files>
| | ├── ViewNumber1
| | | ├── *-summary.txt | | | ├── *-summary.txt
| | | ├── <other classifier dependant files> | | | ├── <other classifier dependant files>
| | ├── ViewNumber2 | | ├── generated_view_2
| | | ├── *-summary.txt | | | ├── *-summary.txt
| | | ├── <other classifier dependant files> | | | ├── <other classifier dependant files>
| ├── decision_tree
| | ├── ViewNumber0
| | | ├── <summary & classifier dependant files>
| | ├── ViewNumber1
| | | ├── <summary & classifier dependant files>
| | ├── ViewNumber2
| | | ├── <summary & classifier dependant files>
| ├── [.. | ├── [..
| ├── ..] | ├── ..]
| ├── weighted_linear_late_fusion | ├── weighted_linear_late_fusion
...@@ -137,27 +169,31 @@ From here the result directory has the structure that follows : ...@@ -137,27 +169,31 @@ From here the result directory has the structure that follows :
| └── random_state.pickle | └── random_state.pickle
The structure can seem complex, but it provides a lot of information, from the most general to the most precise. The structure may seem complex, but it provides a lot of information, from the most general to the most precise.
Let's comment each file : Let's comment each file :
``*-accuracy_score*.html``, ``*-accuracy_score*.png`` and ``*-accuracy_score*.csv`` ``*-accuracy_score*.html``, ``*-accuracy_score*.png`` and ``*-accuracy_score*.csv``
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
These files contain the scores of each classifier for the accuracy metric, ordered with le best ones on the right and These files contain the scores of each classifier for the accuracy metric, ordered with the best ones on the right and the worst ones on the left, as an interactive html page, an image or a csv matrix. The star after ``accuracy_score*`` means that it was the principal metric (the usefulness of the principal metric will be explained later).
the worst ones on the left, as an interactive html page, an image or a csv matrix. The star after ``accuracy_score*`` means that it is the principal metric. The html version is as follows :
The image version is as follows :
.. raw:: html .. raw:: html
:file: ./images/accuracy_score.html :file: ./images/example_1/accuracy.html
This is a bar plot showing the score on the training set (light gray), and testing set (black) for each monoview classifier on each view and or each multiview classifier.
Here, the generated dataset is build to introduce some complementarity amongst the views. As a consequence, the two multiview algorithms, even if they are naive, have a better score than the decision trees.
The ``.csv`` file is a matrix with the score on train stored in the first row and the score on test stored in the second one. Each classifier is presented in a row. It is loadable with pandas.
This is a bar plot showing the score on the training set (light gray), and testing set (dark gray). For each A similar graph ``*-accuracy_score*-class.html``, reports the error of each classifier on each class.
monoview classifier, on each view and or each multiview classifier, the scores are printed over the name, under each bar.
It is highly recommended to click on the image to be able to zoom.
The csv file is a matrix with the score on train stored in the first row and the score on test stored in the second one. Each classifier is presented in a row. It is loadable with pandas. .. raw:: html
:file: ./images/example_1/accuracy_class.html
Similar files have been generated for the f1 metric. Here, for each classifier, 8 bars are plotted, one foe each class. It is clear that fore the monoview algorithms, in views 2 and 3, the third class is difficult, as showed in the error matrix.
``*-error_analysis_2D.png`` and ``*-error_analysis_2D.html`` ``*-error_analysis_2D.png`` and ``*-error_analysis_2D.html``
...@@ -169,17 +205,20 @@ Below, ``*-error_analysis_2D.html`` is displayed. ...@@ -169,17 +205,20 @@ Below, ``*-error_analysis_2D.html`` is displayed.
It is the representation of a matrix, where the rows are the examples, and the columns are the classifiers. It is the representation of a matrix, where the rows are the examples, and the columns are the classifiers.
If a classifier (Lasso on the first view for example) missclassified an example (example number 75 for examples), The examples labelled as ``Mutual_error_*`` are mis-classified by most of the algorithms, the redundant ones are well-classified, and the complementary ones are mixly classified.
a black rectangle is printed in the row corresponding to example 75 and the column corresponding to Lasso-ViewNumber0,
and if the classifier successfully classified the example, a white rectangle is printed. .. note::
It is highly recommended to zoom in the html figure to see each row.
.. raw:: html .. raw:: html
:file: ./images/error_2D.html :file: ./images/example_1/error_2d.html
This figure is the html version of the classifiers errors' visualization. It is interactive, so, by hovering it, the information on This figure is the html version of the classifiers errors' visualization. It is interactive, so, by hovering it, the information on
each classifier and example is printed. The classifiers are ordered as follows: each classifier and example is printed. The classifiers are ordered as follows:
From left to right : all the monoview classifiers on ViewNumber0, all the ones on ViewNumber1, ..., then at the far right, the multiview classifiers From left to right : all the monoview classifiers on the first view, all the ones on the second one, ..., then at the far right, the multiview classifiers
This html image is also available in ``.png`` format, but is then not interactive, so harder to analyze. This html image is also available in ``.png`` format, but is then not interactive, so harder to analyze.
...@@ -190,23 +229,17 @@ It could mean that the example is incorrectly labeled in the dataset or is very ...@@ -190,23 +229,17 @@ It could mean that the example is incorrectly labeled in the dataset or is very
Symmetrically, a mainly-black column means that a classifier spectacularly failed on the asked task. Symmetrically, a mainly-black column means that a classifier spectacularly failed on the asked task.
On the figure displayed here, each view is visible as most monoview classifiers fail on the same examples inside the view.
It is an understandable behaviour as the Plausible dataset's view are generated and noised independently.
Morever, as confirmed by the accuracy graph, four monoview classifiers classified all the example to the same class,
and then, display a black half-column.
The data used to generate those matrices is available in ``*-2D_plot_data.csv`` The data used to generate those matrices is available in ``*-2D_plot_data.csv``
``*-error_analysis_bar.png`` and ``*-error_analysis_bar.html`` ``*-error_analysis_bar.png`` and ``*-error_analysis_bar.html``
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
This file is a different way to visualize the same information as the two previous ones. Indeed, it is a bar plot, This file is a different way to visualize the same information as the two previous ones. Indeed, it is a bar plot, with a bar for each example, counting the ratio of classifiers that failed to classify this particular example.
with a bar for each example, counting the number of classifiers that failed to classify this particular example.
.. figure:: ./images/bar_error.png .. raw:: html
:scale: 25 :file: ./images/example_1/bar.html
The bar plot showing for each example how many classifiers failed on it. All the spikes are the mutual error examples, the complementary ones are the 0.33 bars and the redundant are the empty spaces.
The data used to generate this graph is available in ``*-bar_plot_data.csv`` The data used to generate this graph is available in ``*-bar_plot_data.csv``
...@@ -230,8 +263,7 @@ Classifier-dependant files ...@@ -230,8 +263,7 @@ Classifier-dependant files
For each classifier, at least one file is generated, called ``*-summary.txt``. For each classifier, at least one file is generated, called ``*-summary.txt``.
.. include:: ./images/summary.txt .. include:: ./images/example_1/summary.txt
:literal: :literal:
This regroups the useful information on the classifier's configuration and it's performance. An interpretation section is This regroups the useful information on the classifier's configuration and it's performance. An interpretation section is available for classifiers that present some interpretation-related information (as feature importance).
available for classifiers that present some interpretation-related information (as feature importance). \ No newline at end of file
\ No newline at end of file
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Classification on doc_summit for generated_view_1 with decision_tree.
Database configuration :
- Database name : doc_summit
- View name : generated_view_1 View shape : (296, 3)
- Learning Rate : 0.75
- Labels used : label_1, label_2, label_3, label_4, label_5, label_6, label_7, label_8
- Number of cross validation folds : 2
Classifier configuration :
- DecisionTree with max_depth : 3, criterion : gini, splitter : best, random_state : RandomState(MT19937)
- Executed on 1 core(s)
For Accuracy score using {}, (higher is better) :
- Score on train : 0.5765765765765766
- Score on test : 0.5540540540540541
For F1 score using average: micro, {} (higher is better) :
- Score on train : 0.5765765765765766
- Score on test : 0.5540540540540541
Test set confusion matrix :
╒═════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ │ label_1 │ label_2 │ label_3 │ label_4 │ label_5 │ label_6 │ label_7 │ label_8 │
╞═════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ label_1 │ 5 │ 0 │ 2 │ 0 │ 1 │ 1 │ 0 │ 0 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_2 │ 1 │ 3 │ 1 │ 0 │ 1 │ 1 │ 1 │ 1 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_3 │ 0 │ 1 │ 4 │ 2 │ 1 │ 1 │ 0 │ 0 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_4 │ 2 │ 0 │ 1 │ 5 │ 1 │ 1 │ 0 │ 0 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_5 │ 0 │ 0 │ 0 │ 1 │ 7 │ 0 │ 0 │ 1 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_6 │ 0 │ 0 │ 0 │ 1 │ 1 │ 7 │ 0 │ 0 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_7 │ 1 │ 1 │ 0 │ 0 │ 2 │ 1 │ 4 │ 1 │
├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ label_8 │ 0 │ 0 │ 2 │ 0 │ 0 │ 0 │ 1 │ 6 │
╘═════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛
Classification took 0:00:00
Classifier Interpretation :
First featrue :
2 <= 8.439512252807617
Feature importances :
- Feature index : 0, feature importance : 0.4223852373262582
- Feature index : 1, feature importance : 0.4039223639779588
- Feature index : 2, feature importance : 0.17369239869578307
...@@ -12,7 +12,7 @@ To sum up what you need to run the platform : ...@@ -12,7 +12,7 @@ To sum up what you need to run the platform :
Launching the setup tool Launching the setup tool
------------------------ ------------------------
Run in a terminal the following command, in the multiview-machine-learning-omis directory Run in a terminal the following command, in the ``multiview-machine-learning-omis`` directory
.. code-block:: shell .. code-block:: shell
......
File added
...@@ -3,15 +3,15 @@ ...@@ -3,15 +3,15 @@
# Enable logging # Enable logging
log: True log: True
# The name of each dataset in the directory on which the benchmark should be run # The name of each dataset in the directory on which the benchmark should be run
name: ["plausible"] name: ["doc_summit"]
# A label for the resul directory # A label for the resul directory
label: "" label: "example_1"
# The type of dataset, currently supported ".hdf5", and ".csv" # The type of dataset, currently supported ".hdf5", and ".csv"
file_type: ".hdf5" file_type: ".hdf5"
# The views to use in the banchmark, an empty value will result in using all the views # The views to use in the banchmark, an empty value will result in using all the views
views: views:
# The path to the directory where the datasets are stored, an absolute path is advised # The path to the directory where the datasets are stored, an absolute path is advised
pathf: "examples/data/example_1/" pathf: "examples/data/"
# The niceness of the processes, useful to lower their priority # The niceness of the processes, useful to lower their priority
nice: 0 nice: 0
# The random state of the benchmark, useful for reproducibility # The random state of the benchmark, useful for reproducibility
...@@ -21,10 +21,7 @@ nb_cores: 1 ...@@ -21,10 +21,7 @@ nb_cores: 1
# Used to run the benchmark on the full dataset # Used to run the benchmark on the full dataset
full: True full: True
# Used to be able to run more than one benchmark per minute # Used to be able to run more than one benchmark per minute
debug: True debug: False
# To add noise to the data, will add gaussian noise with noise_std
add_noise: False
noise_std: 0.0
# The directory in which the results will be stored, an absolute path is advised # The directory in which the results will be stored, an absolute path is advised
res_dir: "examples/results/example_1/" res_dir: "examples/results/example_1/"
# If an error occurs in a classifier, if track_tracebacks is set to True, the # If an error occurs in a classifier, if track_tracebacks is set to True, the
...@@ -35,19 +32,19 @@ track_tracebacks: True ...@@ -35,19 +32,19 @@ track_tracebacks: True
# All the classification-realted configuration options # All the classification-realted configuration options
# The ratio of test examples/number of train examples # The ratio of test examples/number of train examples
split: 0.8 split: 0.25
# The nubmer of folds in the cross validation process when hyper-paramter optimization is performed # The nubmer of folds in the cross validation process when hyper-paramter optimization is performed
nb_folds: 2 nb_folds: 2
# The number of classes to select in the dataset # The number of classes to select in the dataset
nb_class: 2 nb_class:
# The name of the classes to select in the dataset # The name of the classes to select in the dataset
classes: classes:
# The type of algorithms to run during the benchmark (monoview and/or multiview) # The type of algorithms to run during the benchmark (monoview and/or multiview)
type: ["monoview","multiview"] type: ["monoview","multiview"]
# The name of the monoview algorithms to run, ["all"] to run all the available classifiers # The name of the monoview algorithms to run, ["all"] to run all the available classifiers
algos_monoview: ["all"] algos_monoview: ["decision_tree"]
# The names of the multiview algorithms to run, ["all"] to run all the available classifiers # The names of the multiview algorithms to run, ["all"] to run all the available classifiers
algos_multiview: ["all"] algos_multiview: ["weighted_linear_early_fusion", "weighted_linear_late_fusion",]
# The number of times the benchamrk is repeated with different train/test # The number of times the benchamrk is repeated with different train/test
# split, to have more statistically significant results # split, to have more statistically significant results
stats_iter: 1 stats_iter: 1
...@@ -55,10 +52,27 @@ stats_iter: 1 ...@@ -55,10 +52,27 @@ stats_iter: 1
metrics: metrics:
accuracy_score: {} accuracy_score: {}
f1_score: f1_score:
average: "binary" average: "micro"
# The metric that will be used in the hyper-parameter optimization process # The metric that will be used in the hyper-parameter optimization process
metric_princ: "accuracy_score" metric_princ: "accuracy_score"
# The type of hyper-parameter optimization method # The type of hyper-parameter optimization method
hps_type: "None" hps_type: "None"
# The number of iteration in the hyper-parameter optimization process # The number of iteration in the hyper-parameter optimization process
hps_args: {} hps_args: {}
### Configuring the hyper-parameters for the classifiers
decision_tree:
max_depth: 3
weighted_linear_early_fusion:
monoview_classifier_name: "decision_tree"
monoview_classifier_config:
decision_tree:
max_depth: 6
weighted_linear_late_fusion:
classifiers_names: "decision_tree"
classifier_configs:
decision_tree:
max_depth: 2
File added
...@@ -9,8 +9,8 @@ class BaseFusionClassifier(): ...@@ -9,8 +9,8 @@ class BaseFusionClassifier():
def init_monoview_estimator(self, classifier_name, classifier_config, def init_monoview_estimator(self, classifier_name, classifier_config,
classifier_index=None, multiclass=False): classifier_index=None, multiclass=False):
if classifier_index is not None: if classifier_index is not None:
if classifier_config is not None and classifier_name in classifier_config: if classifier_config is not None :
classifier_configs = classifier_config[classifier_name] classifier_configs = classifier_config
else: else:
classifier_configs = None classifier_configs = None
else: else:
......
...@@ -253,12 +253,14 @@ def plot_errors_bar(error_on_examples, nb_examples, file_name, ...@@ -253,12 +253,14 @@ def plot_errors_bar(error_on_examples, nb_examples, file_name,
""" """
fig, ax = plt.subplots() fig, ax = plt.subplots()
x = np.arange(nb_examples) x = np.arange(nb_examples)
plt.bar(x, error_on_examples) plt.bar(x, 1-error_on_examples)
plt.title("Number of classifiers that failed to classify each example") plt.title("Number of classifiers that failed to classify each example")
fig.savefig(file_name + "error_analysis_bar.png", transparent=True) fig.savefig(file_name + "error_analysis_bar.png", transparent=True)
plt.close() plt.close()
if use_plotly: if use_plotly:
fig = plotly.graph_objs.Figure([plotly.graph_objs.Bar(x=example_ids, y=error_on_examples)]) fig = plotly.graph_objs.Figure([plotly.graph_objs.Bar(x=example_ids, y=1-error_on_examples)])
fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)')
plotly.offline.plot(fig, filename=file_name + "error_analysis_bar.html", plotly.offline.plot(fig, filename=file_name + "error_analysis_bar.html",
auto_open=False) auto_open=False)
......
...@@ -529,15 +529,15 @@ class HDF5Dataset(Dataset): ...@@ -529,15 +529,15 @@ class HDF5Dataset(Dataset):
self.example_ids)[ self.example_ids)[
example_indices].astype( example_indices].astype(
np.dtype( np.dtype(
"S10"))) "S100")))
else: else:
new_dataset_file["Metadata"].create_dataset("example_ids", new_dataset_file["Metadata"].create_dataset("example_ids",
( (
len(self.example_ids),), len(self.example_ids),),
data=np.array( data=np.array(
self.example_ids).astype( self.example_ids).astype(
np.dtype("S10")), np.dtype("S100")),
dtype=np.dtype("S10")) dtype=np.dtype("S100"))
new_dataset_file["Metadata"].attrs["datasetLength"] = len( new_dataset_file["Metadata"].attrs["datasetLength"] = len(
example_indices) example_indices)
new_dataset_file["Metadata"].attrs["nbClass"] = np.unique(labels) new_dataset_file["Metadata"].attrs["nbClass"] = np.unique(labels)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment