Supervised MultiModal Integration Tool
This project aims to be an easy-to-use solution to run a prior benchmark on a dataset and evaluate mono- & multi-view algorithms capacity to classify it correctly.
Getting Started
Prerequisites (will be automatically installed)
To be able to use this project, you'll need :
And the following python modules :
- numpy, scipy,
- matplotlib - Used to plot results,
- sklearn - Used for the monoview classifiers,
- joblib - Used to compute on multiple threads,
- h5py - Used to generate HDF5 datasets on hard drive and use them to spare RAM,
- pickle - Used to store some results,
- pandas - Used to manipulate data efficiently,
- six -
- m2r - Used to generate documentation from the readme,
- docutils - Used to generate documentation,
- pyyaml - Used to read the config files,
- plotly - Used to generate interactive HTML visuals,
- tabulate - Used to generated the confusion matrix.
Installing
Once you cloned the project from the gitlab repository, you just have to use :
cd path/to/summit/
pip install -e .
In the summit
directory to install SuMMIT and its dependencies.
Running on simulated data
In order to run it you'll need to try on simulated data with the command
from multiview_platform.execute import execute
execute()
This will run the first example. For more information about the examples, see the documentation.
Results will be stored in the results directory of the installation path :
path/to/install/summit/multiview_platform/examples/results
.
The documentation proposes a detailed interpretation of the results.
Discovering the arguments
All the arguments of the platform are stored in a YAML config file. Some config files are given as examples.
The file stored in summit/config_files/config.yml
is documented and it is highly recommended
to read it carefully before playing around with the parameters.
You can create your own configuration file. In order to run the platform with it, run :
from multiview_platform.execute import execute
execute(config_path="/absolute/path/to/your/config/file")
For further information about classifier-specific arguments, see the documentation.
Dataset compatibility
In order to start a benchmark on your own dataset, you need to format it so SuMMIT can use it.
comment: <> (#### If you have multiple .csv
files, you must organize them as :
top_directory/database_name-labels.csv
top_directory/database_name-labels-names.csv
-
top_directory/Views/view_name.csv
ortop_directory/Views/view_name-s.csv
if the view is sparse)
If you already have an HDF5 dataset file it must be formatted as :
-
One dataset for each view called
ViewI
withI
being the view index with 2 attribures :-
attrs["name"]
a string for the name of the view -
attrs["sparse"]
a boolean specifying whether the view is sparse or not (WIP)
-
-
One dataset for the labels called
Labels
with one attribute :-
attrs["names"]
a list of strings encoded in utf-8 naming the labels in the right order
-
-
One group for the additional data called
Metadata
containing at least 1 dataset :-
"example_ids"
, a numpy array of typeS100
, with the ids of the examples in the right order
-
-
And three attributes :
-
attrs["nbView"]
an int counting the total number of views in the dataset -
attrs["nbClass"]
an int counting the total number of different labels in the dataset -
attrs["datasetLength"]
an int counting the total number of examples in the dataset
-
The format_dataset.py
file is documented and can be used to format a multiview dataset in a SuMMIT-compatible HDF5 file.
Running on your dataset
Once you have formatted your dataset, to run SuMMIT on it you need to modify the config file as
name: ["your_file_name"]
*
pathf: "path/to/your/dataset"
This will run a full benchmark on your dataset using all available views and labels.
It is highly recommended to follow the documentation's tutorials to learn the use of each parameter.
Author
- Baptiste BAUVIN
Contributors
- Dominique BENIELLI
- Alexis PROD'HOMME