Doc and corrections

f2960058 · Baptiste Bauvin · 8e48ae02 · f2960058 · f2960058 · f2960058
Commit f2960058 authored Mar 30, 2020 by Baptiste Bauvin
--- a/.gitignore
+++ b/.gitignore
@@ -14,4 +14,8 @@ build*
 dist*
 multiview_platform/.idea/*
 .gitignore
-multiview_platform/examples/results*
+multiview_platform/examples/results/example_1/*
+multiview_platform/examples/results/example_2/*
+multiview_platform/examples/results/example_3/*
+multiview_platform/examples/results/example_4/*
+multiview_platform/examples/results/example_5/*
--- a/README.md
+++ b/README.md
@@ -79,17 +79,17 @@ In order to start a benchmark on your own dataset, you need to format it so SuMM
 [comment]: <> (With `top_directory` being the last directory in the `pathF` argument)
 ##### If you already have an HDF5 dataset file it must be formatted as : 
-One dataset for each view called `ViewI` with `I` being the view index with 2 attribures : 
+* One dataset for each view called `ViewI` with `I` being the view index with 2 attribures : 
    * `attrs["name"]` a string for the name of the view
-* `attrs["sparse"]` a boolean specifying whether the view is sparse or not
+    * `attrs["sparse"]` a boolean specifying whether the view is sparse or not (WIP)
-* `attrs["ranges"]` a `np.array` containing the ranges of each attribute in the view (for ex. : for a pixel the range will be 255, for a real attribute in [-1,1], the range will be 2).
-* `attrs["limits"]` a `np.array` containing all the limits of the attributes int he view. (for ex. : for a pixel the limits will be `[0, 255]`, for a real attribute in [-1,1], the limits will be `[-1,1]`).
-One dataset for the labels called `Labels` with one attribute : 
+* One dataset for the labels called `Labels` with one attribute : 
-* `attrs["names"]` a list of strings encoded in utf-8 namig the labels in the right order
+    * `attrs["names"]` a list of strings encoded in utf-8 naming the labels in the right order
-One group for the additional data called `Metadata` containing at least 3 attributes : 
+* One group for the additional data called `Metadata` containing at least 1 dataset :
+    * `"example_ids"`, a numpy array of type `S100`, with the ids of the examples in the right order
+* And three attributes : 
    * `attrs["nbView"]` an int counting the total number of views in the dataset
    * `attrs["nbClass"]` an int counting the total number of different labels in the dataset
    * `attrs["datasetLength"]` an int counting the total number of examples in the dataset
@@ -115,4 +115,5 @@ It is highly recommended to follow the documentation's [tutorials](http://baptis
 ### Contributors
-* **Dominique Benielli**
+* **Dominique BENIELLI**
+* **Alexis PROD'HOMME**
\ No newline at end of file
--- a/config_files/config_test.yml
+++ b/config_files/config_test.yml
 # The base configuration of the benchmark
 log: True
-name: ["digits",]
+name: ["doc_summit",]
 label: "_"
 file_type: ".hdf5"
 views:
-pathf: "/home/baptiste/Documents/Datasets/Digits/"
+pathf: "/home/baptiste/Documents/Gitwork/multiview_generator/demo/"
 nice: 0
 random_state: 42
 nb_cores: 1
-full: True
+full: False
 debug: True
 add_noise: False
 noise_std: 0.0
@@ -17,10 +17,10 @@ track_tracebacks: False
 # All the classification-realted configuration options
 multiclass_method: "oneVersusOne"
-split: 0.25
+split: 0.5
 nb_folds: 2
-nb_class:
+nb_class: 2
-classes:
+classes: ["label_1", "label_6"]
 type: ["monoview", "multiview"]
 algos_monoview: ["decision_tree" ]
 algos_multiview: ["weighted_linear_early_fusion","weighted_linear_late_fusion"]

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
-MultiviewPlatform documentation master file, created by
+Welcome to Supervised MultiModal Integration Tool's documentation !
-sphinx-quickstart on Mon Jan 29 17:13:09 2018.
+===================================================================
-You can adapt this file completely to your liking, but it should at least
-contain the root `toctree` directive.
-Welcome to SuMMIT's documentation!
+This package has been designed as an easy-to-use platform to estimate different mono- and multi-view classifiers' performances on a multiview dataset.
-=============================================
-This package ha been designed as an easy-to-use platform to estimate different mono- and multi-view classifiers' performances on a multiview dataset.
 The main advantage of the platform is that it allows to add and remove a classifier without modifying its core code (the procedure is described thoroughly in this documentation).
+This documentation consists in a short read me, with instructions to install and get started with SuMMIT, then several use cases to discover the features, and all the documented sources.
+.. note::
+    The documentation, the platform and the tests are constantly being updated.
+    All the content labelled WIP is Work In Progress
 .. toctree::
   :maxdepth: 1
-   :caption: Contents:
   readme_link
   tutorials/index

--- a/docs/source/tutorials/example1.rst
+++ b/docs/source/tutorials/example1.rst
+.. role:: python(code)
+    :language: python
+.. role :: yaml(code)
+    :language: yaml
 ===============================================
-Example 1 : First steps with Multiview Platform
+Example 1 : First steps with SuMMIT
 ===============================================
 Context
@@ -14,15 +19,52 @@ Adding a new classifier (monoview and/or multiview) to the benchmark as been mad
 customize the set of classifiers and test their performances in a controlled environment.
 Introduction to this tutorial
 -----------------------------
-This tutorial will show you how to use the platform on simulated data, for the simplest problem : biclass classification.
+This tutorial will show you how to use the platform on simulated data, for the simplest problem : vanilla multiclass classification.
+The data is naively generated with a soon-to-be published multiview generator that allows to control redundancy, mutual error and complementarity among the views.
+For all the tutorials, we will use the same dataset.
+A generated dataset to rule them all
+------------------------------------
-The data is naively generated TODO : Keep the same generator ?
+The dataset that will be used in the examples consists in
+ 500 examples that are either
+    + mis-described by all the views (labelled ``Mutual_error_*``),
+    + well-described by all the views (labelled ``Redundant_*``),
+    + well-described by the majority of the views (labelled ``Complementary_*``),
+    + randomly well- or mis-described by the views (labelled ``example_*``).
+ 8 balanced classes named ``'label_1'``, ..., ``'label_8'``,
+ 4 views named ``'generated_view_1'``, ..., ``'generated_view_4'``,
+    + each view consisting in 3 features.
+It has been parametrized with the following error matrix :
+---------+--------+--------+--------+--------+
+|         | View 1 | View 2 | View 3 | View 4 |
+=========+========+========+========+========+
+| label_1 |  0.40  |  0.40  |  0.40  |  0.40  |
+---------+--------+--------+--------+--------+
+| label_2 |  0.55  |  0.40  |  0.40  |  0.40  |
+---------+--------+--------+--------+--------+
+| label_3 |  0.40  |  0.50  |  0.60  |  0.55  |
+---------+--------+--------+--------+--------+
+| label_4 |  0.40  |  0.50  |  0.50  |  0.40  |
+---------+--------+--------+--------+--------+
+| label_5 |  0.40  |  0.40  |  0.40  |  0.40  |
+---------+--------+--------+--------+--------+
+| label_6 |  0.40  |  0.40  |  0.40  |  0.40  |
+---------+--------+--------+--------+--------+
+| label_7 |  0.40  |  0.40  |  0.40  |  0.40  |
+---------+--------+--------+--------+--------+
+So this means that the view 1 should make at least 40% error on label 1 and 65% on label 2.
 Getting started
 ---------------
@@ -41,25 +83,26 @@ We will decrypt the main arguments :
 + The first part regroups the basics :
-    - ``log: True`` allows to print the log in the terminal,
+    - :yaml:`log: True` allows to print the log in the terminal,
-    - ``name: ["plausible"]`` uses the plausible simulated dataset,
+    - :yaml:`name: ["summit_doc"]` uses the plausible simulated dataset,
-    - ``random_state: 42`` fixes the seed of the random state for this benchmark, it is useful for reproductibility,
+    - :yaml:`random_state: 42` fixes the seed of the random state for this benchmark, it is useful for reproductibility,
-    - ``full: True`` means the benchmark will use the full dataset,
+    - :yaml:`full: True` means the benchmark will use the full dataset,
-    - ``res_dir: "examples/results/example_1/"`` saves the results in ``multiview-machine-learning-omis/multiview_platform/examples/results/example_1``
+    - :yaml:`res_dir: "examples/results/example_1/"` saves the results in ``multiview-machine-learning-omis/multiview_platform/examples/results/example_1``
 + Then the classification-related arguments :
-    - ``split: 0.8`` means that 80% of the dataset will be used to test the different classifiers and 20% to train them,
+    - :yaml:`split: 0.25` means that 80% of the dataset will be used to test the different classifiers and 20% to train them,
-    - ``type: ["monoview", "multiview"]`` allows for monoview and multiview algorithms to be used in the benchmark,
+    - :yaml:`type: ["monoview", "multiview"]` allows for monoview and multiview algorithms to be used in the benchmark,
-    - ``algos_monoview: ["all"]`` runs on all the available monoview algorithms (same for ``algos_muliview``)
+    - :yaml:`algos_monoview: ["decision_tree"]` runs a Decision tree on each view,
+    - :yaml:`algos_monoview: ["weighted_linear_early_fusion", "weighted_linear_late_fusion"]` runs a late and an early fusion,
    - The metrics configuration ::
                        metrics:
                          accuracy_score:{}
                          f1_score:
-                            average:"binary"
+                            average:"micro"
-      means that the benchmark will evaluate the performance of each algorithms on accuracy, and f1-score with a binary average.
+      means that the benchmark will evaluate the performance of each algorithms on accuracy, and f1-score with a micro average (because of the multi-class setting).
 **Start the benchmark**
@@ -76,9 +119,8 @@ The execution should take less than five minutes. We will first analyze the resu
 The result structure can be startling at first, but, as the platform provides a lot of information, it has to be organized.
-The results are stored in ``multiview_platform/examples/results/example_1/``. Here, you will find a directory with the name of the database used for the benchmark, here : ``plausible/``
+The results are stored in ``multiview_platform/examples/results/example_1/``. Here, you will find a directory with the name of the database used for the benchmark, here : ``summit_doc/``
-Then, a directory with the amount of noise in the experiments, we didn't add any, so ``n_0/`` finally, a directory with
+Finally, a directory with the date and time of the beginning of the experiment. Let's say you started the benchmark on the 25th of December 1560,
-the date and time of the beginning of the experiment. Let's say you started the benchmark on the 25th of December 1560,
 at 03:42 PM, the directory's name should be ``started_1560_12_25-15_42/``.
 From here the result directory has the structure that follows  :
@@ -86,23 +128,13 @@ From here the result directory has the structure that follows  :
 .. code-block:: bash
    | started_1560_12_25-15_42
-    | ├── adaboost
+    | ├── decision_tree
-    | |   ├── ViewNumber0
+    | |   ├── generated_view_1
-    | |   |   ├── *-summary.txt
-    | |   |   ├── <other classifier dependant files>
-    | |   ├── ViewNumber1
    | |   |   ├── *-summary.txt
    | |   |   ├── <other classifier dependant files>
-    | |   ├── ViewNumber2
+    | |   ├── generated_view_2
    | |   |   ├── *-summary.txt
    | |   |   ├── <other classifier dependant files>
-    | ├── decision_tree
-    | |   ├── ViewNumber0
-    | |   |  ├── <summary & classifier dependant files>
-    | |   ├── ViewNumber1
-    | |   |  ├── <summary & classifier dependant files>
-    | |   ├── ViewNumber2
-    | |   |  ├── <summary & classifier dependant files>
    | ├── [..
    | ├── ..]
    | ├── weighted_linear_late_fusion
@@ -137,27 +169,31 @@ From here the result directory has the structure that follows  :
    | └── random_state.pickle
-The structure can seem complex, but it provides a lot of information, from the most general to the most precise.
+The structure may seem complex, but it provides a lot of information, from the most general to the most precise.
 Let's comment each file :
 ``*-accuracy_score*.html``, ``*-accuracy_score*.png`` and ``*-accuracy_score*.csv``
 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
-These files contain the scores of each classifier for the accuracy metric, ordered with le best ones on the right and
+These files contain the scores of each classifier for the accuracy metric, ordered with the best ones on the right and the worst ones on the left, as an interactive html page, an image or a csv matrix. The star after ``accuracy_score*`` means that it was the principal metric (the usefulness of the principal metric will be explained later).
-the worst ones on the left, as an interactive html page, an image or a csv matrix. The star after ``accuracy_score*`` means that it is the principal metric.
+The html version is as follows :
-The image version is as follows :
 .. raw:: html
-    :file: ./images/accuracy_score.html
+    :file: ./images/example_1/accuracy.html
+This is a bar plot showing the score on the training set (light gray), and testing set (black) for each monoview classifier on each view and or each multiview classifier.
+Here, the generated dataset is build to introduce some complementarity amongst the views. As a consequence, the two multiview algorithms, even if they are naive, have a better score than the decision trees.
+The ``.csv`` file is a matrix with the score on train stored in the first row and the score on test stored in the second one. Each classifier is presented in a row. It is loadable with pandas.
-    This is a bar plot showing the score on the training set (light gray), and testing set (dark gray). For each
+A similar graph ``*-accuracy_score*-class.html``, reports the error of each classifier on each class.
-    monoview classifier, on each view and or each multiview classifier, the scores are printed over the name, under each bar.
-    It is highly recommended to click on the image to be able to zoom.
-The csv file is a matrix with the score on train stored in the first row and the score on test stored in the second one. Each classifier is presented in a row. It is loadable with pandas.
+.. raw:: html
+    :file: ./images/example_1/accuracy_class.html
-Similar files have been generated for the f1 metric.
+Here, for each classifier, 8 bars are plotted, one foe each class. It is clear that fore the monoview algorithms, in views 2 and 3, the third class is difficult, as showed in the error matrix.
 ``*-error_analysis_2D.png`` and ``*-error_analysis_2D.html``
@@ -169,17 +205,20 @@ Below, ``*-error_analysis_2D.html`` is displayed.
 It is the representation of a matrix, where the rows are the examples, and the columns are the classifiers.
-If a classifier (Lasso on the first view for example) missclassified an example (example number 75 for examples),
+The examples labelled as ``Mutual_error_*`` are mis-classified by most of the algorithms, the redundant ones are well-classified, and the complementary ones are mixly classified.
-a black rectangle is printed in the row corresponding to example 75 and the column corresponding to Lasso-ViewNumber0,
-and if the classifier successfully classified the example, a white rectangle is printed.
+.. note::
+    It is highly recommended to zoom in the html figure to see each row.
 .. raw:: html
-    :file: ./images/error_2D.html
+    :file: ./images/example_1/error_2d.html
 This figure is the html version of the classifiers errors' visualization. It is interactive, so, by hovering it, the information on
 each classifier and example is printed. The classifiers are ordered as follows:
-From left to right : all the monoview classifiers on ViewNumber0, all the ones on ViewNumber1, ..., then at the far right, the multiview classifiers
+From left to right : all the monoview classifiers on the first view, all the ones on the second one, ..., then at the far right, the multiview classifiers
 This html image is also available in ``.png`` format, but is then not interactive, so harder to analyze.
@@ -190,23 +229,17 @@ It could mean that the example is incorrectly labeled in the dataset or is very
 Symmetrically, a mainly-black column means that a classifier spectacularly failed on the asked task.
-On the figure displayed here, each view is visible as most monoview classifiers fail on the same examples inside the view.
-It is an understandable behaviour as the Plausible dataset's view are generated and noised independently.
-Morever, as confirmed by the accuracy graph, four monoview classifiers classified all the example to the same class,
-and then, display a black half-column.
 The data used to generate those matrices is available in ``*-2D_plot_data.csv``
 ``*-error_analysis_bar.png`` and ``*-error_analysis_bar.html``
 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
-This file is a different way to visualize the same information as the two previous ones. Indeed, it is a bar plot,
+This file is a different way to visualize the same information as the two previous ones. Indeed, it is a bar plot, with a bar for each example, counting the ratio of classifiers that failed to classify this particular example.
-with a bar for each example, counting the number of classifiers that failed to classify this particular example.
-.. figure:: ./images/bar_error.png
+.. raw:: html
-    :scale: 25
+    :file: ./images/example_1/bar.html
-    The bar plot showing for each example how many classifiers failed on it.
+All the spikes are the mutual error examples, the complementary ones are the 0.33 bars and the redundant are the empty spaces.
 The data used to generate this graph is available in ``*-bar_plot_data.csv``
@@ -230,8 +263,7 @@ Classifier-dependant files
 For each classifier, at least one file is generated, called ``*-summary.txt``.
-.. include:: ./images/summary.txt
+.. include:: ./images/example_1/summary.txt
    :literal:
-This regroups the useful information on the classifier's configuration and it's performance. An interpretation section is
+This regroups the useful information on the classifier's configuration and it's performance. An interpretation section is available for classifiers that present some interpretation-related information (as feature importance).
-available for classifiers that present some interpretation-related information (as feature importance).
\ No newline at end of file
\ No newline at end of file
--- a/docs/source/tutorials/images/example_1/accuracy.html
+++ b/docs/source/tutorials/images/example_1/accuracy.html
--- a/docs/source/tutorials/images/example_1/accuracy_class.html
+++ b/docs/source/tutorials/images/example_1/accuracy_class.html
--- a/docs/source/tutorials/images/example_1/bar.html
+++ b/docs/source/tutorials/images/example_1/bar.html
--- a/docs/source/tutorials/images/example_1/error_2d.html
+++ b/docs/source/tutorials/images/example_1/error_2d.html
--- a/docs/source/tutorials/images/example_1/f1-score.html
+++ b/docs/source/tutorials/images/example_1/f1-score.html
--- a/docs/source/tutorials/images/example_1/summary.txt
+++ b/docs/source/tutorials/images/example_1/summary.txt
+Classification on doc_summit for generated_view_1 with decision_tree.
+Database configuration : 
+	- Database name : doc_summit
+	- View name : generated_view_1	 View shape : (296, 3)
+	- Learning Rate : 0.75
+	- Labels used : label_1, label_2, label_3, label_4, label_5, label_6, label_7, label_8
+	- Number of cross validation folds : 2
+Classifier configuration : 
+	- DecisionTree with max_depth : 3, criterion : gini, splitter : best, random_state : RandomState(MT19937)
+	- Executed on 1 core(s) 
+	For Accuracy score using {}, (higher is better) : 
+		- Score on train : 0.5765765765765766
+		- Score on test : 0.5540540540540541
+	For F1 score using average: micro, {} (higher is better) : 
+		- Score on train : 0.5765765765765766
+		- Score on test : 0.5540540540540541
+Test set confusion matrix : 
+╒═════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
+│         │   label_1 │   label_2 │   label_3 │   label_4 │   label_5 │   label_6 │   label_7 │   label_8 │
+╞═════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
+│ label_1 │         5 │         0 │         2 │         0 │         1 │         1 │         0 │         0 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_2 │         1 │         3 │         1 │         0 │         1 │         1 │         1 │         1 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_3 │         0 │         1 │         4 │         2 │         1 │         1 │         0 │         0 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_4 │         2 │         0 │         1 │         5 │         1 │         1 │         0 │         0 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_5 │         0 │         0 │         0 │         1 │         7 │         0 │         0 │         1 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_6 │         0 │         0 │         0 │         1 │         1 │         7 │         0 │         0 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_7 │         1 │         1 │         0 │         0 │         2 │         1 │         4 │         1 │
+├─────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
+│ label_8 │         0 │         0 │         2 │         0 │         0 │         0 │         1 │         6 │
+╘═════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛
+ Classification took 0:00:00
+ Classifier Interpretation : 
+First featrue : 
+	2 <= 8.439512252807617
+Feature importances : 
+- Feature index : 0, feature importance : 0.4223852373262582
+- Feature index : 1, feature importance : 0.4039223639779588
+- Feature index : 2, feature importance : 0.17369239869578307
--- a/docs/source/tutorials/installation.rst
+++ b/docs/source/tutorials/installation.rst
@@ -12,7 +12,7 @@ To sum up what you need to run the platform :
 Launching the setup tool
 ------------------------
-Run in a terminal the following command, in the multiview-machine-learning-omis directory
+Run in a terminal the following command, in the ``multiview-machine-learning-omis`` directory
 .. code-block:: shell

--- a/examples/data/doc_summit.hdf5
+++ b/examples/data/doc_summit.hdf5
--- a/multiview_platform/examples/config_files/config_example_1.yml
+++ b/multiview_platform/examples/config_files/config_example_1.yml
@@ -3,15 +3,15 @@
 # Enable logging
 log: True
 # The name of each dataset in the directory on which the benchmark should be run
-name: ["plausible"]
+name: ["doc_summit"]
 # A label for the resul directory
-label: ""
+label: "example_1"
 # The type of dataset, currently supported ".hdf5", and ".csv"
 file_type: ".hdf5"
 # The views to use in the banchmark, an empty value will result in using all the views
 views:
 # The path to the directory where the datasets are stored, an absolute path is advised
-pathf: "examples/data/example_1/"
+pathf: "examples/data/"
 # The niceness of the processes, useful to lower their priority
 nice: 0
 # The random state of the benchmark, useful for reproducibility
@@ -21,10 +21,7 @@ nb_cores: 1
 # Used to run the benchmark on the full dataset
 full: True
 # Used to be able to run more than one benchmark per minute
-debug: True
+debug: False
-# To add noise to the data, will add gaussian noise with noise_std
-add_noise: False
-noise_std: 0.0
 # The directory in which the results will be stored, an absolute path is advised
 res_dir: "examples/results/example_1/"
 # If an error occurs in a classifier, if track_tracebacks is set to True, the
@@ -35,19 +32,19 @@ track_tracebacks: True
 # All the classification-realted configuration options
 # The ratio of test examples/number of train examples
-split: 0.8
+split: 0.25
 # The nubmer of folds in the cross validation process when hyper-paramter optimization is performed
 nb_folds: 2
 # The number of classes to select in the dataset
-nb_class: 2
+nb_class:
 # The name of the classes to select in the dataset
 classes:
 # The type of algorithms to run during the benchmark (monoview and/or multiview)
 type: ["monoview","multiview"]
 # The name of the monoview algorithms to run, ["all"] to run all the available classifiers
-algos_monoview: ["all"]
+algos_monoview: ["decision_tree"]
 # The names of the multiview algorithms to run, ["all"] to run all the available classifiers
-algos_multiview: ["all"]
+algos_multiview: ["weighted_linear_early_fusion", "weighted_linear_late_fusion",]
 # The number of times the benchamrk is repeated with different train/test
 # split, to have more statistically significant results
 stats_iter: 1
@@ -55,10 +52,27 @@ stats_iter: 1
 metrics:
  accuracy_score: {}
  f1_score:
-    average: "binary"
+    average: "micro"
 # The metric that will be used in the hyper-parameter optimization process
 metric_princ: "accuracy_score"
 # The type of hyper-parameter optimization method
 hps_type: "None"
 # The number of iteration in the hyper-parameter optimization process
 hps_args: {}
+### Configuring the hyper-parameters for the classifiers
+decision_tree:
+  max_depth: 3
+weighted_linear_early_fusion:
+  monoview_classifier_name: "decision_tree"
+  monoview_classifier_config:
+    decision_tree:
+      max_depth: 6
+weighted_linear_late_fusion:
+  classifiers_names: "decision_tree"
+  classifier_configs:
+    decision_tree:
+      max_depth: 2
--- a/multiview_platform/examples/data/doc_summit.hdf5
+++ b/multiview_platform/examples/data/doc_summit.hdf5
--- a/multiview_platform/mono_multi_view_classifiers/multiview_classifiers/additions/fusion_utils.py
+++ b/multiview_platform/mono_multi_view_classifiers/multiview_classifiers/additions/fusion_utils.py
@@ -9,8 +9,8 @@ class BaseFusionClassifier():
    def init_monoview_estimator(self, classifier_name, classifier_config,
                                classifier_index=None, multiclass=False):
        if classifier_index is not None:
-            if classifier_config is not None and classifier_name in classifier_config:
+            if classifier_config is not None :
-                classifier_configs = classifier_config[classifier_name]
+                classifier_configs = classifier_config
            else:
                classifier_configs = None
        else:

--- a/multiview_platform/mono_multi_view_classifiers/result_analysis/error_analysis.py
+++ b/multiview_platform/mono_multi_view_classifiers/result_analysis/error_analysis.py
@@ -253,12 +253,14 @@ def plot_errors_bar(error_on_examples, nb_examples, file_name,
    """
    fig, ax = plt.subplots()
    x = np.arange(nb_examples)
-    plt.bar(x, error_on_examples)
+    plt.bar(x, 1-error_on_examples)
    plt.title("Number of classifiers that failed to classify each example")
    fig.savefig(file_name + "error_analysis_bar.png", transparent=True)
    plt.close()
    if use_plotly:
-        fig = plotly.graph_objs.Figure([plotly.graph_objs.Bar(x=example_ids, y=error_on_examples)])
+        fig = plotly.graph_objs.Figure([plotly.graph_objs.Bar(x=example_ids, y=1-error_on_examples)])
+        fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',
+                          plot_bgcolor='rgba(0,0,0,0)')
        plotly.offline.plot(fig, filename=file_name + "error_analysis_bar.html",
                            auto_open=False)

--- a/multiview_platform/mono_multi_view_classifiers/utils/dataset.py
+++ b/multiview_platform/mono_multi_view_classifiers/utils/dataset.py
@@ -529,15 +529,15 @@ class HDF5Dataset(Dataset):
                                                                     self.example_ids)[
                                                                     example_indices].astype(
                                                                     np.dtype(
-                                                                         "S10")))
+                                                                         "S100")))
        else:
            new_dataset_file["Metadata"].create_dataset("example_ids",
                                                        (
                                                        len(self.example_ids),),
                                                        data=np.array(
                                                            self.example_ids).astype(
-                                                            np.dtype("S10")),
+                                                            np.dtype("S100")),
-                                                        dtype=np.dtype("S10"))
+                                                        dtype=np.dtype("S100"))
        new_dataset_file["Metadata"].attrs["datasetLength"] = len(
            example_indices)
        new_dataset_file["Metadata"].attrs["nbClass"] = np.unique(labels)