Skip to content
Snippets Groups Projects
Select Git revision
  • master default
  • object
  • develop protected
  • private_algos
  • cuisine
  • SMOTE
  • revert-76c4cca5
  • archive protected
  • no_graphviz
  • 0.0.1
10 results

summit

  • Clone with SSH
  • Clone with HTTPS
  • License: GPL v3 Build Status

    Supervised MultiModal Integration Tool

    This project aims to be an easy-to-use solution to run a prior benchmark on a dataset and evaluate mono- & multi-view algorithms capacity to classify it correctly.

    Getting Started

    Prerequisites (will be automatically installed)

    To be able to use this project, you'll need :

    And the following python modules :

    • numpy, scipy,
    • matplotlib - Used to plot results,
    • sklearn - Used for the monoview classifiers,
    • joblib - Used to compute on multiple threads,
    • h5py - Used to generate HDF5 datasets on hard drive and use them to spare RAM,
    • pickle - Used to store some results,
    • pandas - Used to manipulate data efficiently,
    • six -
    • m2r - Used to generate documentation from the readme,
    • docutils - Used to generate documentation,
    • pyyaml - Used to read the config files,
    • plotly - Used to generate interactive HTML visuals,
    • tabulate - Used to generated the confusion matrix.

    Installing

    Once you cloned the project from the gitlab repository, you just have to use :

    cd path/to/summit/
    pip install -e .

    In the summit directory to install SuMMIT and its dependencies.

    Running on simulated data

    In order to run it you'll need to try on simulated data with the command

    from multiview_platform.execute import execute
    execute()

    This will run the first example. For more information about the examples, see the documentation. Results will be stored in the results directory of the installation path : path/to/install/summit/multiview_platform/examples/results. The documentation proposes a detailed interpretation of the results.

    Discovering the arguments

    All the arguments of the platform are stored in a YAML config file. Some config files are given as examples. The file stored in summit/config_files/config.yml is documented and it is highly recommended to read it carefully before playing around with the parameters.

    You can create your own configuration file. In order to run the platform with it, run :

    from multiview_platform.execute import execute
    execute(config_path="/absolute/path/to/your/config/file")

    For further information about classifier-specific arguments, see the documentation.

    Dataset compatibility

    In order to start a benchmark on your own dataset, you need to format it so SuMMIT can use it.

    comment: <> (#### If you have multiple .csv files, you must organize them as :

    • top_directory/database_name-labels.csv
    • top_directory/database_name-labels-names.csv
    • top_directory/Views/view_name.csv or top_directory/Views/view_name-s.csv if the view is sparse)
    If you already have an HDF5 dataset file it must be formatted as :
    • One dataset for each view called ViewI with I being the view index with 2 attribures :

      • attrs["name"] a string for the name of the view
      • attrs["sparse"] a boolean specifying whether the view is sparse or not (WIP)
    • One dataset for the labels called Labels with one attribute :

      • attrs["names"] a list of strings encoded in utf-8 naming the labels in the right order
    • One group for the additional data called Metadata containing at least 1 dataset :

      • "example_ids", a numpy array of type S100, with the ids of the examples in the right order
    • And three attributes :

      • attrs["nbView"] an int counting the total number of views in the dataset
      • attrs["nbClass"] an int counting the total number of different labels in the dataset
      • attrs["datasetLength"] an int counting the total number of examples in the dataset

    The format_dataset.py file is documented and can be used to format a multiview dataset in a SuMMIT-compatible HDF5 file.

    Running on your dataset

    Once you have formatted your dataset, to run SuMMIT on it you need to modify the config file as

    name: ["your_file_name"]
    *
    pathf: "path/to/your/dataset"

    This will run a full benchmark on your dataset using all available views and labels.

    It is highly recommended to follow the documentation's tutorials to learn the use of each parameter.

    Author

    • Baptiste BAUVIN

    Contributors

    • Dominique BENIELLI
    • Alexis PROD'HOMME