diff --git a/README.md b/README.md index b68b72a01e1db4bb1f9fbacf8e3bce3f625be1e8..55028c1a7cdf7d6062e11f5388dfb80d998ffa70 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ [](http://www.gnu.org/licenses/gpl-3.0) -[](https://travis-ci.com/babau1/multiview-machine-learning-omis) +[](https://gitlab.lis-lab.fr/baptiste.bauvin/multiview-machine-learning-omis/badges/develop/build.svg) # Mono- and Multi-view classification benchmark This project aims to be an easy-to-use solution to run a prior benchmark on a dataset and evaluate mono- & multi-view algorithms capacity to classify it correctly. @@ -21,30 +21,36 @@ And the following python modules : * [h5py](https://www.h5py.org) - Used to generate HDF5 datasets on hard drive and use them to spare RAM * [pickle](https://docs.python.org/3/library/pickle.html) - Used to store some results * ([graphviz](https://pypi.python.org/pypi/graphviz) - Used for decision tree interpretation) -* [pandas] +* [pandas](https://pandas.pydata.org/) - -They are all tested in `multiview-machine-mearning-omis/multiview_platform/MonoMutliViewClassifiers/Versions.py` which is automatically checked each time you run the `Exec` script +They are all tested in `multiview-machine-mearning-omis/multiview_platform/MonoMutliViewClassifiers/Versions.py` which is automatically checked each time you run the `execute` script ### Installing -No installation is needed, just the prerequisites. +cd in the project directory +and install the project + +``` +cd multiview-machine-learning-omis +pip install -e . +``` ### Running on simulated data In order to run it you'll need to try on **simulated** data with the command ``` cd multiview-machine-learning-omis/multiview_platform -python Exec.py -log +python execute.py -log ``` -Results will be stored in `multiview-machine-learning-omis/multiview_platform/MonoMultiViewClassifiers/Results/` +Results will be stored in `multiview-machine-learning-omis/multiview_platform/mono_multi_view_classifiers/results/` If you want to run a multiclass (one versus one) benchmark on simulated data, use : ``` cd multiview-machine-learning-omis/multiview_platform -python Exec.py -log --CL_nbClass 3 +python execute.py -log --CL_nbClass 3 ``` -If no path is specified, simulated hdf5 datasets are stored in `multiview-machine-learning-omis/Data` +If no path is specified, simulated hdf5 datasets are stored in `multiview-machine-learning-omis/data` ### Discovering the arguments @@ -52,7 +58,7 @@ If no path is specified, simulated hdf5 datasets are stored in `multiview-machin In order to see all the arguments of this script, their description and default values run : ``` cd multiview-machine-learning-omis/multiview_platform -python Exec.py -h +python execute.py -h ``` The arguments can be passed through a file using `python Exec.py @<path_to_doc>` The file must be formatted with one newline instead of each space : @@ -67,14 +73,14 @@ SVM ``` Moreover, for Monoview algorithms (Multiview is still WIP), it is possible to pass multiple arguments instead of just one. -Thus, executing `python Exec.py --RF_trees 10 100 --RF_max_depth 3 4 --RF_criterion entropy` will result in the generation of several classifiers called +Thus, executing `python execute.py --RF_trees 10 100 --RF_max_depth 3 4 --RF_criterion entropy` will result in the generation of several classifiers called `RandomForest_10_3_entropy`, with 10 trees and a max depth of 3, `RandomForest_10_4_entropy`, with 10 tress and a max depth of 4, `RandomForest_100_3_entropy`, `RandomForest_100_4_entropy` to test all the passed arguments combinations. -### Understanding `Results/` architecture +### Understanding `results/` architecture -Results are stored in `multiview-machine-learning-omis/multiview_platform/MonoMultiViewClassifiers/Results/` +Results are stored in `multiview-machine-learning-omis/multiview_platform/mono_multi_view_classifiers/results/` A directory will be created with the name of the database used to run the script. For each time the script is run, a new directory named after the running date and time will be created. In that directory: @@ -82,7 +88,7 @@ In that directory: * If it is run with one iteration, the iteration results will be stored in the current directory The results for each iteration are graphs plotting the classifiers scores and the classifiers config and results are stored in a directory of their own. -To explore the results run the `Exec` script and go in `multiview-machine-learning-omis/multiview_platform/MonoMultiViewClassifiers/Results/Plausible/` +To explore the results run the `execute` script and go in `multiview-machine-learning-omis/multiview_platform/mono_multi_view_classifiers/results/plausible/` ### Dataset compatibility @@ -118,7 +124,7 @@ One group for the additional data called `Metadata` containing at least 3 attrib In order to run the script on your dataset you need to use : ``` cd multiview-machine-learning-omis/multiview_platform -python Exec.py -log --name <your_dataset_name> --type <.cvs_or_.hdf5> --pathF <path_to_your_dataset> +python execute.py -log --name <your_dataset_name> --type <.cvs_or_.hdf5> --pathF <path_to_your_dataset> ``` This will run a full benchmark on your dataset using all available views and labels. diff --git a/data/Plausible.hdf5 b/data/Plausible.hdf5 new file mode 100644 index 0000000000000000000000000000000000000000..947a30ea7b8b7213f075b8c03149c4efd0e5df28 Binary files /dev/null and b/data/Plausible.hdf5 differ diff --git a/data/Plausible0.hdf5 b/data/Plausible0.hdf5 new file mode 100644 index 0000000000000000000000000000000000000000..c7e0dd9d3a42182c5879b66d3ac225656171d2e0 Binary files /dev/null and b/data/Plausible0.hdf5 differ diff --git a/data/Plausible1.hdf5 b/data/Plausible1.hdf5 new file mode 100644 index 0000000000000000000000000000000000000000..c7e0dd9d3a42182c5879b66d3ac225656171d2e0 Binary files /dev/null and b/data/Plausible1.hdf5 differ diff --git a/docs/source/analyzeresult.rst b/docs/source/analyzeresult.rst index 2367d0d6d17114b02e7ae8770033eb9810088785..1477e2efa0768f3d3dc09bbb52e974d931d65391 100644 --- a/docs/source/analyzeresult.rst +++ b/docs/source/analyzeresult.rst @@ -1,5 +1,5 @@ Result analysis module ====================== -.. automodule:: multiview_platform.MonoMultiViewClassifiers.ResultAnalysis - :members: \ No newline at end of file +.. automodule:: multiview_platform.mono_multi_view_classifiers.result_analysis + :members: diff --git a/docs/source/api.rst b/docs/source/api.rst index d5bc51ec2f59e5cf9a482a0c29bfa8197f2b7703..367a94fb76c984ec0b6edd22efd21226821f20c6 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -2,9 +2,9 @@ Multiview Platform ================== .. toctree:: - :maxdepth: 1 + :maxdepth: 3 :caption: Contents: execution monomultidoc - analyzeresult \ No newline at end of file + analyzeresult diff --git a/docs/source/conf.py b/docs/source/conf.py index f3f304a15f51d16a586ecdba53437253a24699a9..d3f13a6cf5d11f25a49d1d9862abcbb88d8a01a9 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -18,11 +18,12 @@ # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # -# import os -# import sys -# sys.path.insert(0, os.path.abspath('.')) - +import os +import sys +sys.path.insert(0, os.path.abspath('.')) +sys.path.insert(0, os.path.abspath('../../multiview_platform')) +sys.path.insert(0, os.path.abspath('../..')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. @@ -37,17 +38,18 @@ add_module_names = False # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = ['sphinx.ext.autodoc', - 'sphinx.ext.doctest', - 'sphinx.ext.intersphinx', - 'sphinx.ext.todo', - 'nbsphinx', - 'sphinx.ext.coverage', - 'sphinx.ext.mathjax', - 'sphinx.ext.ifconfig', - 'sphinx.ext.viewcode', - 'sphinx.ext.githubpages', - 'sphinx.ext.napoleon', - 'm2r',] +# 'sphinx.ext.doctest', +# 'sphinx.ext.intersphinx', +# 'sphinx.ext.todo', +# 'nbsphinx', + 'sphinx.ext.coverage', + 'sphinx.ext.imgmath', +# 'sphinx.ext.mathjax', +# 'sphinx.ext.ifconfig', +# 'sphinx.ext.viewcode', +# 'sphinx.ext.githubpages', + 'sphinx.ext.napoleon', + 'm2r',] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] @@ -55,12 +57,12 @@ templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # -source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'} +# source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'} # source_suffix = '.rst' -# source_suffix = ['.rst', '.md'] +source_suffix = ['.rst', '.md'] # source_parsers = { -# '.md': CommonMarkParser, +# '.md': CommonMarkParser, # } # The master toctree document. @@ -103,7 +105,8 @@ todo_include_todos = True # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # -html_theme = 'sphinx_rtd_theme' +# html_theme = 'sphinx_rtd_theme' +html_theme = 'classic' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the diff --git a/docs/source/execution.rst b/docs/source/execution.rst index 3d26fece2aa89ea3212a2051624d9068f8e8b8fb..b465e63f09e2175ba485d45e2d82b79bab573baa 100644 --- a/docs/source/execution.rst +++ b/docs/source/execution.rst @@ -1,6 +1,6 @@ Welcome to the exection documentation ===================================== -.. automodule:: multiview_platform.Exec +.. automodule:: multiview_platform.execute :members: diff --git a/docs/source/monomulti/exec_classif.rst b/docs/source/monomulti/exec_classif.rst index fb379570eb8367796a8ecc95cd12877dcfb03d0b..31dd4af59cdf4e0aea31903f917fd3f475521abf 100644 --- a/docs/source/monomulti/exec_classif.rst +++ b/docs/source/monomulti/exec_classif.rst @@ -1,6 +1,6 @@ Classification execution module =============================== -.. automodule:: multiview_platform.MonoMultiViewClassifiers.ExecClassif +.. automodule:: multiview_platform.mono_multi_view_classifiers.exec_classif :members: - :inherited-members: \ No newline at end of file + :inherited-members: diff --git a/docs/source/monomulti/metrics.rst b/docs/source/monomulti/metrics.rst index c42b38c49b6529c78865f2ceacf212ae5b55f112..310b33ff6a38a7dca42e00e47b394390b084eb23 100644 --- a/docs/source/monomulti/metrics.rst +++ b/docs/source/monomulti/metrics.rst @@ -1,6 +1,6 @@ Metrics framework ================= -.. automodule:: multiview_platform.MonoMultiViewClassifiers.Metrics.framework +.. automodule:: multiview_platform.mono_multi_view_classifiers.metrics.framework :members: - :inherited-members: \ No newline at end of file + :inherited-members: diff --git a/docs/source/monomulti/monoview_classifier.ipynb b/docs/source/monomulti/monoview_classifier.ipynb deleted file mode 100644 index a7e85bbc180ab7c192038b3667fbd619ce872881..0000000000000000000000000000000000000000 --- a/docs/source/monomulti/monoview_classifier.ipynb +++ /dev/null @@ -1,100 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Monoview classifier framework" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## File addition" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* In the `MonoviewClassifiers` package, you need to add a python module called after your monoview classifier (let's call it MOC for **MO**noview **C**lassifier)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## The `MOC.py` file" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this file, you need to add several functions forthe platform to be able to use your classifier, they are alllisted below : " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### `canProbas`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is just used to knowif the classifier can predict a probability for each label instead of just predicting the a label." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def canProbas():\n", - " return True" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### `fit`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function returns a fitted sklearn classifier object" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "2.7.13" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/docs/source/monomulti/multiview_classifier.ipynb b/docs/source/monomulti/multiview_classifier.ipynb deleted file mode 100644 index 734b0c79b56507b04073f1d682c1853037f4c186..0000000000000000000000000000000000000000 --- a/docs/source/monomulti/multiview_classifier.ipynb +++ /dev/null @@ -1,551 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# How to add a multiview classifier to the platform" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## File addition " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* In the `Code/MonoMultiViewClassifiers/MultiviewClassifiers` package, add a new package named after your multiview classifier (let's call it NMC for New Multiview Classifier).\n", - "\n", - "* In this package (`Code/MonoMultiViewClassifiers/MultiviewClassifiers/NMC`), add a file called `NMCModule.py` and another one called `analyzeResults.py`. These will be the two files used by the platform to communicate with your implementation.\n", - "\n", - "* You can now add either a package named after your classifier `NMCPackage` and paste your files in it or just add a file with the same name if it is enough." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## `NMCModule.py`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here we will list all the necessary functions of the python module to allow the platform to use NMC" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The functions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `getArgs`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is used to multiple arguments dictionaries from one benchmark entry. It must return the `argumentsList` to which it must have add at least a dictionary containing all the necessary information to run NMC. You must add all general fields about the type of classifier and a field called `NMCKWARGS` (`<classifier_name>KWARGS`) conataining another dictionary with classifier-specific agruments (we assume here that NMC has two hyper-parameters : a set of weights and an integer) " - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "arguments = {\"CL_type\":\"NMC\", \n", - " \"views\":[\"all\", \"the\", \"views\", \"names\"],\n", - " \"NB_VIEW\":len([\"all\", \"the\", \"views\", \"names\"]), \n", - " \"viewsIndices\":[\"the indices\", \"of the\", \"views in\", \"the hdf5 file\"], \n", - " \"NB_CLASS\": \"the number of labels of the dataset\", \n", - " \"LABLELS_NAMES\": [\"the names of\", \"the labels used\"], \n", - " \"NMCKWARGS\":{\"weights\":[], \n", - " \"integer\":42,\n", - " \"nbViews\":5}\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To fill these fields, you can use the default values given in argument of the function : " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):\n", - " argumentsList = []\n", - " nbViews = len(views)\n", - " arguments = {\"CL_type\": \"NMC\",\n", - " \"views\": views,\n", - " \"NB_VIEW\": len(views),\n", - " \"viewsIndices\": viewsIndices,\n", - " \"NB_CLASS\": len(args.CL_classes),\n", - " \"LABELS_NAMES\": args.CL_classes,\n", - " \"NMCKWARGS\": {\"weights\":[],\n", - " \"integer\":42,\n", - " \"nbViews\":5}}\n", - " argumentsList.append(arguments)\n", - " return argumentsList" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is also used to add the user-defined configuration for the classifier, but we will discuss it later" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `genName`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is used to generate a short string describing the classifier using its configuration." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def genName(config):\n", - " return \"NMF\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Some classifiers, like some late fusion classifiers will have more complicated `genName` functions that will need to summurize which monoview classifiers they use in a short string using the `config` argument that is exactly the dictionay called `\"NMCKWARGS\"` in the `getArgs` function" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `getBenchmark`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is used to generate the `benchmark` argument of `getArgs`. It stores all the different configurations that will have to be tested (does not inculde hyper-parameters sets). For example for the Mumbo classifier, il will store the list of possible algorithms to use as weak leaners. " - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def getBenchmark(benchmark, args=None):\n", - " benchmark[\"Multiview\"][\"NMC\"] = [\"Some NMC cnfigurations\"]\n", - " return benchmark" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `benchmark` argument is pre-generated with an entry for all the multiview classifiers so you just need to fill it with the different configurations" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `genParamsSets`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This function is used to generate random hyper-parameters sets to allow a randomized search to estimate the best one. It works in pair with the `setParams` method implemented in the classifier's class so you need to keep in mind the order of the hyper-paramters you used here.\n", - "\n", - "The `classificationKWARGS` argument is the `\"NMCKWARGS\"` entry seen earlier, and it is highly recommended to use the `randomState` object (which is described [here](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.RandomState.html)) to generate random numbers in order for the results to be reproductible\n", - "\n", - "Assuming our NMC classifier has 2 HP, one weight vector for each view and one integer that can be between 1 and 100, the `genParamsSets` function will look like :" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def genParamsSets(classificationKWARGS, randomState, nIter=1):\n", - " weightsVector = [randomState.random_sample(classificationKWARGS[\"nbViews\"]) for _ in range(nIter)]\n", - " nomralizedWeights = [weights/np.sum(weights) for weights in weightsVector]\n", - " intsVector = list(randomState.randint(1,100,nIter))\n", - " paramsSets = [[normalizedWeight, integer] for normalizedWeight, interger in zip(normalizedWeights, intsVector)]\n", - " return paramsSets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The `NMC` class" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It has to be named after the classifier adding `Class` at the end of its name. " - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "class NMCClass:\n", - " pass" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `init` method" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There is nothing specific to define in the `__init__` method, you just need to initialize the attributes of your classifier. The `kwargs` argument is the `NMCKWARGS` dictionary seen earlier. In our example, NMC uses two hyper parameters : weights and an int." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def __init__(self, randomState, NB_CORES=1, **kwargs):\n", - " if kwargs[\"weights\"] == []:\n", - " self.weights = randomState.random_sample(classificationKWARGS[\"nbViews\"])\n", - " else:\n", - " self.weights = kwargs[\"weights\"]\n", - " self.weights /= np.sum(self.weights)\n", - " self.integer = kwargs[\"integer\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `setParams` method" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This method is used to tune your classifier with a set of hyper parameters. The set is a list ordered as in the `genParamsSets` function seen earlier. The input of the `setParams` method is a list of parameters in the right order. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def setParams(self, paramsSet):\n", - " self.weights = paramsSet[0]\n", - " self.integer = paramsSet[1]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `fit_hdf5` method" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This method is generaly the same as `sklearn`'s `fit` method but uses as an input an HDF5 dataset in order to lower the memory usage of the whole platform.\n", - "* The `DATASET` object is an HDF5 dataset file containing all the views and labels. \n", - "* The `usedIndices` object is a `numpy` 1d-array containing the indices of the examples want to learn from. \n", - "* The `viewsIndices` object is a `numpy` 1d-array containing the indices of the views we want to learn from. " - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def fit_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):\n", - " # Call the fit function of your own module\n", - " pass" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `predict_hdf5` method" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This method is used as an HDF5-compatible method similar to `sklearn`'s `predict` method. It has the same input than the `fit_hdf5` method but returns a 1d-array containing the labels of the asked examples (ordered as in `usedIndices`)." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def predict_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):\n", - " # Call the predict function of your own module\n", - " predictedLabels = None # Just to avoid any ipynb running error\n", - " return predictedLabels" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once you've added everything to the `NMCModule.py` file you are close to be able to run your algorithm on the platform, you just need to fill the `analyzeResults.py` file." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## `analyzeResults.py`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `analyzeResults.py` file is a module used to get a specific result analysis for your classifier. You have, in order to run the platform, to add aunique function called `execute` that will run the analysis and return three different variables : \n", - "* `stringAnalysis` is a string that will be saved in a file to describe the classifier, its performance and may give some insights on the interpretation of it's way to classify. \n", - "* `imagesAnalysis` is a dictionary where you can store images (as values) to describe the classifier & co., the keys will be the images names. \n", - "* `metricsScores` is a dictionary where the values are lists containing train and test scores, and the keys are the metrics names. ( `metricsScores = {\"accuracy_score\":[0.99, 0.10]}` )\n", - "The `execute` function has as inputs : \n", - "* `classifier` is a classifier object from your classifiers class\n", - "* `trainLabels` are the labels predicted for the train set by the classifier\n", - "* `testLabels` are the labels predicted for the test set by the classifier\n", - "* `DATASET` is the HDF5 dataset object\n", - "* `classificationKWARGS` is the dictionary named `NMCKWARGS` earlier\n", - "* `classificationIndices` is a triplet containing the learning indices, the validation indices and the testIndices for multiclass classification\n", - "* `LABELS_DICTIONARY` is a dictionary containing a label as a key and it's name as a value\n", - "* `views` is the list of the views names used by the classifier\n", - "* `nbCores` is an `int` fixing the number of threads used by the platform \n", - "* `times` is a tuple containing the extraction time and the classification time\n", - "* `name` is the name ofthe database on which the plartform is running\n", - "* `KFolds` is an `sklearn` kfold object used for the cross-validation\n", - "* `hyperParamSearch` is the type of the hyper parameters optimization method\n", - "* `nIter` is the number of iterations of the hyper parameters method\n", - "* `metrics` is the list of the metrics and their arguments\n", - "* `viewsIndices` is 1d-array of the indices of the views used for classification\n", - "* `randomState` is a `numpy` RandomState object\n", - "* `labels` are the groud truth labels of the dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The basic function analyzing results for all the classifiers looks like : " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "from ... import Metrics\n", - "from ...utils.MultiviewResultAnalysis import printMetricScore, getMetricsScores\n", - "\n", - "def execute(classifier, trainLabels,\n", - " testLabels, DATASET,\n", - " classificationKWARGS, classificationIndices,\n", - " LABELS_DICTIONARY, views, nbCores, times,\n", - " name, KFolds,\n", - " hyperParamSearch, nIter, metrics,\n", - " viewsIndices, randomState, labels):\n", - " CLASS_LABELS = labels\n", - " learningIndices, validationIndices, testIndicesMulticlass = classificationIndices\n", - "\n", - " metricModule = getattr(Metrics, metrics[0][0])\n", - " if metrics[0][1] is not None:\n", - " metricKWARGS = dict((index, metricConfig) for index, metricConfig in enumerate(metrics[0][1]))\n", - " else:\n", - " metricKWARGS = {}\n", - " scoreOnTrain = metricModule.score(CLASS_LABELS[learningIndices], CLASS_LABELS[learningIndices], **metricKWARGS)\n", - " scoreOnTest = metricModule.score(CLASS_LABELS[validationIndices], testLabels, **metricKWARGS)\n", - "\n", - " # To be modified to fit to your classifier \n", - " classifierConfigurationString = \"with weights : \"+ \", \".join(map(str, list(classifier.weights))) + \", and integer : \"+str(classifier.integer)\n", - " # Modify the name of the classifier in these strings\n", - " stringAnalysis = \"\\t\\tResult for Multiview classification with NMC \"+ \\\n", - " \"\\n\\n\" + metrics[0][0] + \" :\\n\\t-On Train : \" + str(scoreOnTrain) + \"\\n\\t-On Test : \" + str(\n", - " scoreOnTest) + \\\n", - " \"\\n\\nDataset info :\\n\\t-Database name : \" + name + \"\\n\\t-Labels : \" + \\\n", - " ', '.join(LABELS_DICTIONARY.values()) + \"\\n\\t-Views : \" + ', '.join(views) + \"\\n\\t-\" + str(\n", - " KFolds.n_splits) + \\\n", - " \" folds\\n\\nClassification configuration : \\n\\t-Algorithm used : NMC \" + classifierConfigurationString\n", - "\n", - " metricsScores = getMetricsScores(metrics, trainLabels, testLabels,\n", - " validationIndices, learningIndices, labels)\n", - " stringAnalysis += printMetricScore(metricsScores, metrics)\n", - "\n", - " imagesAnalysis = {}\n", - " return stringAnalysis, imagesAnalysis, metricsScores" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once you have done this, your classifier is ready to be used by the platform, but you can add some description about your classifier in the analyzeResults file. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Adding arguments to avoid hyper parameter optimization" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In order to be able to test a specific set of arguments on this platform, you need to add some lines in the argument parser located in the file `Code/MonoMultiViewClassifiers/utils/execution.py` in the `parseTheArgs` function. What you need to do is to add a group of arguments, allowing you to pass the hyper parameters in the command line :" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "groupNMC = parser.add_argument_group('New Multiview Classifier arguments')\n", - "groupNMC.add_argument('--NMC_weights', metavar='FLOAT', action='store', nargs=\"+\",\n", - " help='Determine the weights of NMC', type=float,\n", - " default=[])\n", - "groupNMC.add_argument('--NMC_integer', metavar='INT', action='store',\n", - " help='Determine the integer of NMC', type=int,\n", - " default=42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In order for the platform to use these arguments, you need to modify the `getArgs` function of the file `NMCModule.py`. \n" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):\n", - " argumentsList = []\n", - " nbViews = len(views)\n", - " arguments = {\"CL_type\": \"NMC\",\n", - " \"views\": views,\n", - " \"NB_VIEW\": len(views),\n", - " \"viewsIndices\": viewsIndices,\n", - " \"NB_CLASS\": len(args.CL_classes),\n", - " \"LABELS_NAMES\": args.CL_classes,\n", - " \"NMCKWARGS\": {\"weights\":args.NMC_weights, # Modified to take the args into account\n", - " \"integer\":args.NMC_integer, # Modified to take the args into account\n", - " \"nbViews\":5}}\n", - " argumentsList.append(arguments)\n", - " return argumentsList" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python3", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2.0 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "2.7.13" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/docs/source/monomulti/multiview_classifiers/classifiers.rst b/docs/source/monomulti/multiview_classifiers/classifiers.rst new file mode 100644 index 0000000000000000000000000000000000000000..0ca3191db60590927cad6cf3a3326af6fa95108f --- /dev/null +++ b/docs/source/monomulti/multiview_classifiers/classifiers.rst @@ -0,0 +1,8 @@ +Classifiers +=========== + +.. autosummary:: + :toctree: DIRNAME + + multiview_platform.mono_multi_view_classifiers.monoview_classifiers + diff --git a/docs/source/monomulti/multiview_classifiers/diversity_fusion.rst b/docs/source/monomulti/multiview_classifiers/diversity_fusion.rst index 6d8e675c2c9d085564f4f796cc7e079b9629de73..a60545a25a3e138686f1676a92ff43a79d577a30 100644 --- a/docs/source/monomulti/multiview_classifiers/diversity_fusion.rst +++ b/docs/source/monomulti/multiview_classifiers/diversity_fusion.rst @@ -1,5 +1,5 @@ Diversity Fusion Classifiers ============================ -.. automodule:: multiview_platform.MonoMultiViewClassifiers.Multiview.Additions.diversity_utils -:members: +.. automodule:: multiview_platform.mono_multi_view_classifiers.multiview.additions.diversity_utils + :members: diff --git a/docs/source/monomulti/utils/multiclass.rst b/docs/source/monomulti/utils/multiclass.rst index cd86315269fe085ddea94183b21d62e2155e3083..9f79bc8d88d05c975e94afb014f77f6e2a38c58a 100644 --- a/docs/source/monomulti/utils/multiclass.rst +++ b/docs/source/monomulti/utils/multiclass.rst @@ -1,6 +1,6 @@ Utils Multiclass module ======================= -.. automodule:: multiview_platform.MonoMultiViewClassifiers.utils.Multiclass +.. automodule:: multiview_platform.mono_multi_view_classifiers.utils.multiclass :members: - :inherited-members: \ No newline at end of file + :inherited-members: diff --git a/docs/source/monomultidoc.rst b/docs/source/monomultidoc.rst index b25fd849aaefb289724abedd80a1a95ee03d3938..4ada7eec84a07f26d75a856b86fb9d38b7a4e166 100644 --- a/docs/source/monomultidoc.rst +++ b/docs/source/monomultidoc.rst @@ -2,11 +2,11 @@ Mono and mutliview classification ================================= .. toctree:: - :maxdepth: 1 + :maxdepth: 3 :caption: Contents: monomulti/metrics - monomulti/monoview_classifier + monomulti/monoview_classifier/classifiers monomulti/multiview_classifier monomulti/exec_classif monomulti/multiview_classifiers/diversity_fusion diff --git a/docs/source/readme_link.rst b/docs/source/readme_link.rst index c27a295c65fae03addd457470f676aff8a9c5b9d..b80fd35dd3d78ff671efacf369adbd7aca3279ec 100644 --- a/docs/source/readme_link.rst +++ b/docs/source/readme_link.rst @@ -1,4 +1,4 @@ Readme ====== -.. mdinclude:: ../../README.md \ No newline at end of file +.. mdinclude:: ../../README.md diff --git a/multiview_platform/execute.py b/multiview_platform/execute.py index 3dcb05c6827a59ba1f2f938e5fc6a118937061c1..53d4fcc9fdb31920c40ae1802db43fb17f7058e3 100644 --- a/multiview_platform/execute.py +++ b/multiview_platform/execute.py @@ -1,7 +1,7 @@ """This is the execution module, used to execute the code""" -def Exec(): +def exec(): import versions versions.testVersions() import sys @@ -11,4 +11,4 @@ def Exec(): if __name__ == "__main__": - Exec() + exec() diff --git a/multiview_platform/mono_multi_view_classifiers/monoview/exec_classif_mono_view.py b/multiview_platform/mono_multi_view_classifiers/monoview/exec_classif_mono_view.py index d228a56b766402017ebe9316ed9b1df45808d259..1870e9b74a855c6e89fe6a5170d06b0bc5c2d3c7 100644 --- a/multiview_platform/mono_multi_view_classifiers/monoview/exec_classif_mono_view.py +++ b/multiview_platform/mono_multi_view_classifiers/monoview/exec_classif_mono_view.py @@ -255,7 +255,7 @@ def saveResults(stringAnalysis, outputFileName, full_labels_pred, y_train_pred, # help='Name of the view used', default='View0') # groupStandard.add_argument('--pathF', metavar='STRING', action='store', # help='Path to the database hdf5 file', -# default='../../../Data/Plausible') +# default='../../../data/Plausible') # groupStandard.add_argument('--directory', metavar='STRING', action='store', # help='Path of the output directory', default='') # groupStandard.add_argument('--labelsNames', metavar='STRING', diff --git a/multiview_platform/mono_multi_view_classifiers/monoview/export_results.py b/multiview_platform/mono_multi_view_classifiers/monoview/export_results.py index b2d969b3c0f30063086c5004bf301e5c35c6b7cb..ad9a7f3c715dd1fb18e8add81e199a39d9234c08 100644 --- a/multiview_platform/mono_multi_view_classifiers/monoview/export_results.py +++ b/multiview_platform/mono_multi_view_classifiers/monoview/export_results.py @@ -14,7 +14,7 @@ import pandas as pd # for Series and DataFrames from matplotlib.offsetbox import AnchoredOffsetbox, TextArea, \ HPacker # to generate the Annotations in plot from pylab import rcParams # to change size of plot -from scipy.interpolate import interp1d # to Interpolate Data +from scipy.interpolate import interp1d # to Interpolate data from sklearn import metrics # For stastics on classification # Import own modules @@ -122,8 +122,8 @@ def showScoreTime(directory, filename, store, resScore, resTime, rangeX, ax1.add_artist(anchored_box) fig.subplots_adjust(top=0.7) - ax1.legend(['Score Data', 'Score Interpolated'], loc='upper left') - ax2.legend(['Time Data', 'Time Interpolated'], loc='lower right') + ax1.legend(['Score data', 'Score Interpolated'], loc='upper left') + ax2.legend(['Time data', 'Time Interpolated'], loc='lower right') plt.title(fig_desc, fontsize=18) diff --git a/multiview_platform/mono_multi_view_classifiers/monoview/monoview_utils.py b/multiview_platform/mono_multi_view_classifiers/monoview/monoview_utils.py index 9f75e36ff13f02c2a4611ba33d344b18879acb07..4f3500d77a90381a74f759687a029ca9166f6199 100644 --- a/multiview_platform/mono_multi_view_classifiers/monoview/monoview_utils.py +++ b/multiview_platform/mono_multi_view_classifiers/monoview/monoview_utils.py @@ -273,7 +273,7 @@ class MonoviewResult(object): # return trainingExamplesIndices -##### Generating Test and Train Data +##### Generating Test and Train data # def calcTrainTestOwn(X,y,split): # # classLabels = pd.Series(y) @@ -383,7 +383,7 @@ class MonoviewResult(object): # This means the oob method is n_observations/3 times faster to train then the leave-one-out method. # -# X_test: Test Data +# X_test: Test data # y_test: Test Labels # num_estimators: number of trees # def MonoviewClassifRandomForest(X_train, y_train, nbFolds=4, nbCores=1, **kwargs): diff --git a/multiview_platform/mono_multi_view_classifiers/utils/execution.py b/multiview_platform/mono_multi_view_classifiers/utils/execution.py index 5a32172bc7b1d4f4cd24bacf6fa9cc86ffd373d4..da36fb45b55050c03a11e6472938d069954c367f 100644 --- a/multiview_platform/mono_multi_view_classifiers/utils/execution.py +++ b/multiview_platform/mono_multi_view_classifiers/utils/execution.py @@ -45,7 +45,7 @@ def parseTheArgs(arguments): # groupStandard.add_argument('--pathF', metavar='STRING', action='store', # help='Path to the hdf5 dataset or database ' # 'folder (default: %(default)s)', -# default='../Data/') +# default='../data/') # groupStandard.add_argument('--nice', metavar='INT', action='store', # type=int, # help='Niceness for the processes', default=0) diff --git a/multiview_platform/mono_multi_view_classifiers/utils/get_multiview_db.py b/multiview_platform/mono_multi_view_classifiers/utils/get_multiview_db.py index 5821bf0261b618ba2d945c3e20a20e70bce25202..22246c81090ba680fda86bfb9f508f6d51090390 100644 --- a/multiview_platform/mono_multi_view_classifiers/utils/get_multiview_db.py +++ b/multiview_platform/mono_multi_view_classifiers/utils/get_multiview_db.py @@ -610,25 +610,25 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # def getMultiOmicDBcsv(features, path, name, NB_CLASS, LABELS_NAMES, randomState): # datasetFile = h5py.File(path + "MultiOmic.hdf5", "w") # -# logging.debug("Start:\t Getting Methylation Data") +# logging.debug("Start:\t Getting Methylation data") # methylData = np.genfromtxt(path + "matching_methyl.csv", delimiter=',') # methylDset = datasetFile.create_dataset("View0", methylData.shape) # methylDset[...] = methylData # methylDset.attrs["name"] = "Methyl" # methylDset.attrs["sparse"] = False # methylDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Methylation Data") +# logging.debug("Done:\t Getting Methylation data") # -# logging.debug("Start:\t Getting MiRNA Data") +# logging.debug("Start:\t Getting MiRNA data") # mirnaData = np.genfromtxt(path + "matching_mirna.csv", delimiter=',') # mirnaDset = datasetFile.create_dataset("View1", mirnaData.shape) # mirnaDset[...] = mirnaData # mirnaDset.attrs["name"] = "MiRNA_" # mirnaDset.attrs["sparse"] = False # mirnaDset.attrs["binary"] = False -# logging.debug("Done:\t Getting MiRNA Data") +# logging.debug("Done:\t Getting MiRNA data") # -# logging.debug("Start:\t Getting RNASeq Data") +# logging.debug("Start:\t Getting RNASeq data") # rnaseqData = np.genfromtxt(path + "matching_rnaseq.csv", delimiter=',') # uselessRows = [] # for rowIndex, row in enumerate(np.transpose(rnaseqData)): @@ -640,16 +640,16 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # rnaseqDset.attrs["name"] = "RNASeq_" # rnaseqDset.attrs["sparse"] = False # rnaseqDset.attrs["binary"] = False -# logging.debug("Done:\t Getting RNASeq Data") +# logging.debug("Done:\t Getting RNASeq data") # -# logging.debug("Start:\t Getting Clinical Data") +# logging.debug("Start:\t Getting Clinical data") # clinical = np.genfromtxt(path + "clinicalMatrix.csv", delimiter=',') # clinicalDset = datasetFile.create_dataset("View3", clinical.shape) # clinicalDset[...] = clinical # clinicalDset.attrs["name"] = "Clinic" # clinicalDset.attrs["sparse"] = False # clinicalDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Clinical Data") +# logging.debug("Done:\t Getting Clinical data") # # labelFile = open(path + 'brca_labels_triple-negatif.csv') # labels = np.array([int(line.strip().split(',')[1]) for line in labelFile]) @@ -849,11 +849,11 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # def getKMultiOmicDBcsv(features, path, name, NB_CLASS, LABELS_NAMES): # datasetFile = h5py.File(path + "KMultiOmic.hdf5", "w") # -# # logging.debug("Start:\t Getting Methylation Data") +# # logging.debug("Start:\t Getting Methylation data") # methylData = np.genfromtxt(path + "matching_methyl.csv", delimiter=',') -# logging.debug("Done:\t Getting Methylation Data") +# logging.debug("Done:\t Getting Methylation data") # -# logging.debug("Start:\t Getting Sorted Methyl Data") +# logging.debug("Start:\t Getting Sorted Methyl data") # Methyl = methylData # sortedMethylGeneIndices = np.zeros(methylData.shape, dtype=int) # MethylRanking = np.zeros(methylData.shape, dtype=int) @@ -864,9 +864,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # sortedMethylGeneIndices[exampleIndex] = sortedMethylIndicesArray # for geneIndex in range(Methyl.shape[1]): # MethylRanking[exampleIndex, sortedMethylIndicesArray[geneIndex]] = geneIndex -# logging.debug("Done:\t Getting Sorted Methyl Data") +# logging.debug("Done:\t Getting Sorted Methyl data") # -# logging.debug("Start:\t Getting Binarized Methyl Data") +# logging.debug("Start:\t Getting Binarized Methyl data") # k = findClosestPowerOfTwo(9) - 1 # try: # factorizedLeftBaseMatrix = np.genfromtxt( @@ -884,9 +884,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # bMethylDset.attrs["name"] = "BMethyl" + str(k) # bMethylDset.attrs["sparse"] = False # bMethylDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized Methyl Data") +# logging.debug("Done:\t Getting Binarized Methyl data") # -# logging.debug("Start:\t Getting Binned Methyl Data") +# logging.debug("Start:\t Getting Binned Methyl data") # lenBins = 3298 # nbBins = 9 # overlapping = 463 @@ -906,9 +906,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # binnedMethyl.attrs["name"] = "bMethyl" + str(nbBins) # binnedMethyl.attrs["sparse"] = False # binnedMethyl.attrs["binary"] = True -# logging.debug("Done:\t Getting Binned Methyl Data") +# logging.debug("Done:\t Getting Binned Methyl data") # -# logging.debug("Start:\t Getting Binarized Methyl Data") +# logging.debug("Start:\t Getting Binarized Methyl data") # k = findClosestPowerOfTwo(17) - 1 # try: # factorizedLeftBaseMatrix = np.genfromtxt( @@ -926,9 +926,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # bMethylDset.attrs["name"] = "BMethyl" + str(k) # bMethylDset.attrs["sparse"] = False # bMethylDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized Methyl Data") +# logging.debug("Done:\t Getting Binarized Methyl data") # -# logging.debug("Start:\t Getting Binned Methyl Data") +# logging.debug("Start:\t Getting Binned Methyl data") # lenBins = 2038 # nbBins = 16 # overlapping = 442 @@ -948,7 +948,7 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # binnedMethyl.attrs["name"] = "bMethyl" + str(nbBins) # binnedMethyl.attrs["sparse"] = False # binnedMethyl.attrs["binary"] = True -# logging.debug("Done:\t Getting Binned Methyl Data") +# logging.debug("Done:\t Getting Binned Methyl data") # # labelFile = open(path + 'brca_labels_triple-negatif.csv') # labels = np.array([int(line.strip().split(',')[1]) for line in labelFile]) @@ -977,16 +977,16 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # def getModifiedMultiOmicDBcsv(features, path, name, NB_CLASS, LABELS_NAMES): # datasetFile = h5py.File(path + "ModifiedMultiOmic.hdf5", "w") # -# logging.debug("Start:\t Getting Methylation Data") +# logging.debug("Start:\t Getting Methylation data") # methylData = np.genfromtxt(path + "matching_methyl.csv", delimiter=',') # methylDset = datasetFile.create_dataset("View0", methylData.shape) # methylDset[...] = methylData # methylDset.attrs["name"] = "Methyl_" # methylDset.attrs["sparse"] = False # methylDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Methylation Data") +# logging.debug("Done:\t Getting Methylation data") # -# logging.debug("Start:\t Getting Sorted Methyl Data") +# logging.debug("Start:\t Getting Sorted Methyl data") # Methyl = datasetFile["View0"][...] # sortedMethylGeneIndices = np.zeros(datasetFile.get("View0").shape, dtype=int) # MethylRanking = np.zeros(datasetFile.get("View0").shape, dtype=int) @@ -1001,9 +1001,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # mMethylDset.attrs["name"] = "SMethyl" # mMethylDset.attrs["sparse"] = False # mMethylDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Sorted Methyl Data") +# logging.debug("Done:\t Getting Sorted Methyl data") # -# logging.debug("Start:\t Getting Binarized Methyl Data") +# logging.debug("Start:\t Getting Binarized Methyl data") # k = findClosestPowerOfTwo(58) - 1 # try: # factorizedLeftBaseMatrix = np.genfromtxt( @@ -1021,9 +1021,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # bMethylDset.attrs["name"] = "BMethyl" # bMethylDset.attrs["sparse"] = False # bMethylDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized Methyl Data") +# logging.debug("Done:\t Getting Binarized Methyl data") # -# logging.debug("Start:\t Getting Binned Methyl Data") +# logging.debug("Start:\t Getting Binned Methyl data") # lenBins = 2095 # nbBins = 58 # overlapping = 1676 @@ -1043,18 +1043,18 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # binnedMethyl.attrs["name"] = "bMethyl" # binnedMethyl.attrs["sparse"] = False # binnedMethyl.attrs["binary"] = True -# logging.debug("Done:\t Getting Binned Methyl Data") +# logging.debug("Done:\t Getting Binned Methyl data") # -# logging.debug("Start:\t Getting MiRNA Data") +# logging.debug("Start:\t Getting MiRNA data") # mirnaData = np.genfromtxt(path + "matching_mirna.csv", delimiter=',') # mirnaDset = datasetFile.create_dataset("View1", mirnaData.shape) # mirnaDset[...] = mirnaData # mirnaDset.attrs["name"] = "MiRNA__" # mirnaDset.attrs["sparse"] = False # mirnaDset.attrs["binary"] = False -# logging.debug("Done:\t Getting MiRNA Data") +# logging.debug("Done:\t Getting MiRNA data") # -# logging.debug("Start:\t Getting Sorted MiRNA Data") +# logging.debug("Start:\t Getting Sorted MiRNA data") # MiRNA = datasetFile["View1"][...] # sortedMiRNAGeneIndices = np.zeros(datasetFile.get("View1").shape, dtype=int) # MiRNARanking = np.zeros(datasetFile.get("View1").shape, dtype=int) @@ -1069,9 +1069,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # mmirnaDset.attrs["name"] = "SMiRNA_" # mmirnaDset.attrs["sparse"] = False # mmirnaDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Sorted MiRNA Data") +# logging.debug("Done:\t Getting Sorted MiRNA data") # -# logging.debug("Start:\t Getting Binarized MiRNA Data") +# logging.debug("Start:\t Getting Binarized MiRNA data") # k = findClosestPowerOfTwo(517) - 1 # try: # factorizedLeftBaseMatrix = np.genfromtxt( @@ -1089,9 +1089,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # bmirnaDset.attrs["name"] = "BMiRNA_" # bmirnaDset.attrs["sparse"] = False # bmirnaDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized MiRNA Data") +# logging.debug("Done:\t Getting Binarized MiRNA data") # -# logging.debug("Start:\t Getting Binned MiRNA Data") +# logging.debug("Start:\t Getting Binned MiRNA data") # lenBins = 14 # nbBins = 517 # overlapping = 12 @@ -1111,9 +1111,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # binnedMiRNA.attrs["name"] = "bMiRNA_" # binnedMiRNA.attrs["sparse"] = False # binnedMiRNA.attrs["binary"] = True -# logging.debug("Done:\t Getting Binned MiRNA Data") +# logging.debug("Done:\t Getting Binned MiRNA data") # -# logging.debug("Start:\t Getting RNASeq Data") +# logging.debug("Start:\t Getting RNASeq data") # rnaseqData = np.genfromtxt(path + "matching_rnaseq.csv", delimiter=',') # uselessRows = [] # for rowIndex, row in enumerate(np.transpose(rnaseqData)): @@ -1125,9 +1125,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # rnaseqDset.attrs["name"] = "RNASeq_" # rnaseqDset.attrs["sparse"] = False # rnaseqDset.attrs["binary"] = False -# logging.debug("Done:\t Getting RNASeq Data") +# logging.debug("Done:\t Getting RNASeq data") # -# logging.debug("Start:\t Getting Sorted RNASeq Data") +# logging.debug("Start:\t Getting Sorted RNASeq data") # RNASeq = datasetFile["View2"][...] # sortedRNASeqGeneIndices = np.zeros(datasetFile.get("View2").shape, dtype=int) # RNASeqRanking = np.zeros(datasetFile.get("View2").shape, dtype=int) @@ -1142,9 +1142,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # mrnaseqDset.attrs["name"] = "SRNASeq" # mrnaseqDset.attrs["sparse"] = False # mrnaseqDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Sorted RNASeq Data") +# logging.debug("Done:\t Getting Sorted RNASeq data") # -# logging.debug("Start:\t Getting Binarized RNASeq Data") +# logging.debug("Start:\t Getting Binarized RNASeq data") # k = findClosestPowerOfTwo(100) - 1 # try: # factorizedLeftBaseMatrix = np.genfromtxt( @@ -1163,9 +1163,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # brnaseqDset.attrs["name"] = "BRNASeq" # brnaseqDset.attrs["sparse"] = False # brnaseqDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized RNASeq Data") +# logging.debug("Done:\t Getting Binarized RNASeq data") # -# logging.debug("Start:\t Getting Binned RNASeq Data") +# logging.debug("Start:\t Getting Binned RNASeq data") # lenBins = 986 # nbBins = 142 # overlapping = 493 @@ -1185,18 +1185,18 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # binnedRNASeq.attrs["name"] = "bRNASeq" # binnedRNASeq.attrs["sparse"] = False # binnedRNASeq.attrs["binary"] = True -# logging.debug("Done:\t Getting Binned RNASeq Data") +# logging.debug("Done:\t Getting Binned RNASeq data") # -# logging.debug("Start:\t Getting Clinical Data") +# logging.debug("Start:\t Getting Clinical data") # clinical = np.genfromtxt(path + "clinicalMatrix.csv", delimiter=',') # clinicalDset = datasetFile.create_dataset("View3", clinical.shape) # clinicalDset[...] = clinical # clinicalDset.attrs["name"] = "Clinic_" # clinicalDset.attrs["sparse"] = False # clinicalDset.attrs["binary"] = False -# logging.debug("Done:\t Getting Clinical Data") +# logging.debug("Done:\t Getting Clinical data") # -# logging.debug("Start:\t Getting Binarized Clinical Data") +# logging.debug("Start:\t Getting Binarized Clinical data") # binarized_clinical = np.zeros((347, 1951), dtype=np.uint8) # nb_already_done = 0 # for feqtureIndex, feature in enumerate(np.transpose(clinical)): @@ -1210,9 +1210,9 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # bClinicalDset.attrs["name"] = "bClinic" # bClinicalDset.attrs["sparse"] = False # bClinicalDset.attrs["binary"] = True -# logging.debug("Done:\t Getting Binarized Clinical Data") +# logging.debug("Done:\t Getting Binarized Clinical data") # -# # logging.debug("Start:\t Getting Adjacence RNASeq Data") +# # logging.debug("Start:\t Getting Adjacence RNASeq data") # # sparseAdjRNASeq = getAdjacenceMatrix(RNASeqRanking, sortedRNASeqGeneIndices, k=findClosestPowerOfTwo(10)-1) # # sparseAdjRNASeqGrp = datasetFile.create_group("View6") # # dataDset = sparseAdjRNASeqGrp.create_dataset("data", sparseAdjRNASeq.data.shape, data=sparseAdjRNASeq.data) @@ -1223,7 +1223,7 @@ def getClassicDBcsv(views, pathF, nameDB, NB_CLASS, askedLabelsNames, # # sparseAdjRNASeqGrp.attrs["name"]="ARNASeq" # # sparseAdjRNASeqGrp.attrs["sparse"]=True # # sparseAdjRNASeqGrp.attrs["shape"]=sparseAdjRNASeq.shape -# # logging.debug("Done:\t Getting Adjacence RNASeq Data") +# # logging.debug("Done:\t Getting Adjacence RNASeq data") # # labelFile = open(path + 'brca_labels_triple-negatif.csv') # labels = np.array([int(line.strip().split(',')[1]) for line in labelFile]) diff --git a/multiview_platform/tests/test_ExecClassif.py b/multiview_platform/tests/test_ExecClassif.py index 9da27eefe43dec9c27030a01ca07fb3f819519e8..3420821a5dfe1b8cff3d4f0b8ffde77d1f12f5c5 100644 --- a/multiview_platform/tests/test_ExecClassif.py +++ b/multiview_platform/tests/test_ExecClassif.py @@ -412,7 +412,7 @@ class Test_execOneBenchmark_multicore(unittest.TestCase): # help='Name of the views selected for learning (default: %(default)s)', # default=['']) # groupStandard.add_argument('--pathF', metavar='STRING', action='store', help='Path to the views (default: %(default)s)', -# default='/home/bbauvin/Documents/Data/Data_multi_omics/') +# default='/home/bbauvin/Documents/data/Data_multi_omics/') # groupStandard.add_argument('--nice', metavar='INT', action='store', type=int, # help='Niceness for the process', default=0) # groupStandard.add_argument('--randomState', metavar='STRING', action='store', diff --git a/multiview_platform/tests/tmp_tests/test_file.hdf5 b/multiview_platform/tests/tmp_tests/test_file.hdf5 new file mode 100644 index 0000000000000000000000000000000000000000..61b8ac6df7647609d0c294b06be19ce42362af47 Binary files /dev/null and b/multiview_platform/tests/tmp_tests/test_file.hdf5 differ diff --git a/setup.py b/setup.py index 08366b42e43f861ff8aac02b9eed65ec7b0b5ecf..8ed4b6c46acd183045a7327669cd8b14acfc34a5 100644 --- a/setup.py +++ b/setup.py @@ -54,7 +54,7 @@ def setup_package(): # Une url qui pointe vers la page officielle de votre lib url='http://github.com/babau1/multiview-machine-learning-omis/', install_requires=['numpy>=1.8', 'scipy>=0.16','scikit-learn==0.19', - 'h5py', 'joblib', 'pyscm', 'pandas'], + 'h5py', 'joblib', 'pyscm', 'pandas', 'm2r'], # Il est d'usage de mettre quelques metadata à propos de sa lib # Pour que les robots puissent facilement la classer. # La liste des marqueurs autorisées est longue: @@ -80,7 +80,7 @@ def setup_package(): # La syntaxe est "nom-de-commande-a-creer = package.module:fonction". entry_points={ 'console_scripts': [ - 'exec_multiview = multiview_platform.Exec:Exec', + 'exec_multiview = multiview_platform.execute:exec', ], },