Did a lot of doc on sphinx

a4e4d897 · Baptiste Bauvin · d429937e · a4e4d897 · a4e4d897 · a4e4d897
Commit a4e4d897 authored 7 years ago by Baptiste Bauvin
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -27,6 +27,8 @@
 #
 # needs_sphinx = '1.0'

+add_module_names = False
+
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
@@ -34,6 +36,8 @@ extensions = ['sphinx.ext.autodoc',
    'sphinx.ext.doctest',
    'sphinx.ext.intersphinx',
    'sphinx.ext.todo',
+    'numpydoc',
+    'nbsphinx',
    'sphinx.ext.coverage',
    'sphinx.ext.mathjax',
    'sphinx.ext.ifconfig',

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -11,7 +11,7 @@ Welcome to MultiviewPlatform's documentation!
   :caption: Contents:

   api
-   examples
+.. examples




--- a/docs/source/monomulti/exec_classif.rst
+++ b/docs/source/monomulti/exec_classif.rst
+Classification execution module
+===============================
+
+.. automodule:: multiview_platform.MonoMultiViewClassifiers.ExecClassif
+   :members:
+   :inherited-members:
\ No newline at end of file
--- a/docs/source/monomulti/metrics.rst
+++ b/docs/source/monomulti/metrics.rst
-Welcome to the metrics documentation!
-=============================================
+Metrics framework
+=================

-.. automodule:: multiview_platform.Metrics.framework
+.. automodule:: multiview_platform.MonoMultiViewClassifiers.Metrics.framework
   :members:
   :inherited-members:
\ No newline at end of file
--- a/docs/source/monomulti/multiview_classifier.ipynb
+++ b/docs/source/monomulti/multiview_classifier.ipynb
--- a/docs/source/monomulti/randomized_cv.ipynb
+++ b/docs/source/monomulti/randomized_cv.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Randomized example selection for classification\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train/test split generation "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The train/test splits are generated in the `execution.genSplits` function. It's task is to generate an train/test split for each statistical itetarion. In order to do that, it is fed by the following inputs \n",
+    "* `labels` are the data labels for all the dataset, \n",
+    "* `splitRatio` is a real number giving the ratio |test|/|all|,\n",
+    "* `statsIterRandomStates` is a list of `numpy` random states used to generate reproductible pseudo-random numbers\n",
+    "\n",
+    "The main operation in this function is done by the `sklearn.model_selection.StratifiedShuffleSplit` function which returns folds that are made by preserving the percentage of samples for each class.\n",
+    "In this case we askittosplit the dataset in two subsets with the asked test size. It then returns a shuffled train/test split while preserving the percentage of samples for each class.\n",
+    "We store the examples indices in two `np.array`s called `trainIndices` and `testIndices`\n",
+    "All the algortihms will then train (hyper-parameters cross-validation & learning) on the trainIndices.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def genSplits(labels, splitRatio, statsIterRandomStates):\n",
+    "    \"\"\"Used to gen the train/test splits using one or multiple random states\n",
+    "    classificationIndices is a list of train/test splits\"\"\"\n",
+    "    indices = np.arange(len(labels))\n",
+    "    splits = []\n",
+    "    for randomState in statsIterRandomStates:\n",
+    "        foldsObj = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1,\n",
+    "                                                                  random_state=randomState,\n",
+    "                                                                  test_size=splitRatio)\n",
+    "        folds = foldsObj.split(indices, labels)\n",
+    "        for fold in folds:\n",
+    "            train_fold, test_fold = fold\n",
+    "        trainIndices = indices[train_fold]\n",
+    "        testIndices = indices[test_fold]\n",
+    "        splits.append([trainIndices, testIndices])\n",
+    "\n",
+    "    return splits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multiclass problems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To be able to use the platform on multiclass problems, a one-versus-one method is implemented. \n",
+    "In orderto use one-versus-one we need to modify the train/test splits generated by the previouslydescribed founction. \n",
+    "If the problem the platformis asked to resolve is multiclass then, it will generate all the possible two-class combinations to divide the main problem in multiple biclass ones. \n",
+    "In order to adapt each split, the `genMulticlassLabels` function will create new train/test splits by : \n",
+    "* Generating an `oldIndices` list containing all the examples indices that have their label in the combination\n",
+    "* Generate a new train split by selecting only the indices of `trainIndices` that have their labels in the combination.\n",
+    "* Do the samething for the test indices\n",
+    "* Copy the old `testIndices` variable in a new one called `testIndicesMulticlass` that will be used to predict on the entire dataset once the algorithm has learn to distinguish two classes\n",
+    "* Generate a new `label` array by replacing all the labels that are not in the combination by -100 to flag them as unseen labels during the training phase. \n",
+    "\n",
+    "Then the function will return a triplet : \n",
+    "* `multiclassLabels` is a list containing, for each combination, the newly generated labels with ones and zeros for each of the labels in the combination and -100 for the others.\n",
+    "* `labelsIndices` is a list contaningall the combinations,\n",
+    "* `indicesMulticlass` is a list containig triplets for each statistical iteration :\n",
+    "    * `trainIndices` are the indices used for training that were picked only in the two classes of the combination (at the second step of the previous list),\n",
+    "    * `testIndices` are the indices used for testing the biclass-generalization capacity of each biclass classifier learned on `trainIndices` that were picked only in the two classes of the combination (at the third step of the previous list),\n",
+    "    * `tesIndicesMulticlass` are the indices described at the fourth setp of the previous list. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cross-validation folds "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "The cross validation folds are generated using a `StratifiedKFold` object, for the `sklearn.model_selection` library. \n",
+    "* For all the **monoview** algorithms, these objects (one for each statistical iteration) are then fed in a `sklearn.model_selection` `RandomisedSearchCV` object. So we don't have any custom stuff about cross-vaildation folds in the monoview case\n",
+    "* In the **multiview** case, they are used in the `utils.HyperParametersSearch` module, in the `randomizedSearch` function. In this case, they are used to split the learning set with `multiviewFolds = KFolds.split(learningIndices, labels[learningIndices])` and then used in `for trainIndices, testIndices in multiviewFolds:`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    ""
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2.0
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
+%% Cell type:markdown id: tags:
+
+# Randomized example selection for classification
+
+%% Cell type:markdown id: tags:
+
+## Train/test split generation
+
+%% Cell type:markdown id: tags:
+
+The train/test splits are generated in the `execution.genSplits` function. It's task is to generate an train/test split for each statistical itetarion. In order to do that, it is fed by the following inputs
+* `labels` are the data labels for all the dataset,
+* `splitRatio` is a real number giving the ratio |test|/|all|,
+* `statsIterRandomStates` is a list of `numpy` random states used to generate reproductible pseudo-random numbers
+
+The main operation in this function is done by the `sklearn.model_selection.StratifiedShuffleSplit` function which returns folds that are made by preserving the percentage of samples for each class.
+In this case we askittosplit the dataset in two subsets with the asked test size. It then returns a shuffled train/test split while preserving the percentage of samples for each class.
+We store the examples indices in two `np.array`s called `trainIndices` and `testIndices`
+All the algortihms will then train (hyper-parameters cross-validation & learning) on the trainIndices.
+
+%% Cell type:code id: tags:
+
+``` python
+def genSplits(labels, splitRatio, statsIterRandomStates):
+    """Used to gen the train/test splits using one or multiple random states
+    classificationIndices is a list of train/test splits"""
+    indices = np.arange(len(labels))
+    splits = []
+    for randomState in statsIterRandomStates:
+        foldsObj = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1,
+                                                                  random_state=randomState,
+                                                                  test_size=splitRatio)
+        folds = foldsObj.split(indices, labels)
+        for fold in folds:
+            train_fold, test_fold = fold
+        trainIndices = indices[train_fold]
+        testIndices = indices[test_fold]
+        splits.append([trainIndices, testIndices])
+
+    return splits
+```
+
+%% Cell type:markdown id: tags:
+
+## Multiclass problems
+
+%% Cell type:markdown id: tags:
+
+To be able to use the platform on multiclass problems, a one-versus-one method is implemented.
+In orderto use one-versus-one we need to modify the train/test splits generated by the previouslydescribed founction.
+If the problem the platformis asked to resolve is multiclass then, it will generate all the possible two-class combinations to divide the main problem in multiple biclass ones.
+In order to adapt each split, the `genMulticlassLabels` function will create new train/test splits by :
+* Generating an `oldIndices` list containing all the examples indices that have their label in the combination
+* Generate a new train split by selecting only the indices of `trainIndices` that have their labels in the combination.
+* Do the samething for the test indices
+* Copy the old `testIndices` variable in a new one called `testIndicesMulticlass` that will be used to predict on the entire dataset once the algorithm has learn to distinguish two classes
+* Generate a new `label` array by replacing all the labels that are not in the combination by -100 to flag them as unseen labels during the training phase.
+
+Then the function will return a triplet :
+* `multiclassLabels` is a list containing, for each combination, the newly generated labels with ones and zeros for each of the labels in the combination and -100 for the others.
+* `labelsIndices` is a list contaningall the combinations,
+* `indicesMulticlass` is a list containig triplets for each statistical iteration :
+    * `trainIndices` are the indices used for training that were picked only in the two classes of the combination (at the second step of the previous list),
+    * `testIndices` are the indices used for testing the biclass-generalization capacity of each biclass classifier learned on `trainIndices` that were picked only in the two classes of the combination (at the third step of the previous list),
+    * `tesIndicesMulticlass` are the indices described at the fourth setp of the previous list.
+
+%% Cell type:markdown id: tags:
+
+## Cross-validation folds
+
+%% Cell type:markdown id: tags:
+
+The cross validation folds are generated using a `StratifiedKFold` object, for the `sklearn.model_selection` library.
+* For all the **monoview** algorithms, these objects (one for each statistical iteration) are then fed in a `sklearn.model_selection` `RandomisedSearchCV` object. So we don't have any custom stuff about cross-vaildation folds in the monoview case
+* In the **multiview** case, they are used in the `utils.HyperParametersSearch` module, in the `randomizedSearch` function. In this case, they are used to split the learning set with `multiviewFolds = KFolds.split(learningIndices, labels[learningIndices])` and then used in `for trainIndices, testIndices in multiviewFolds:`
+
+%% Cell type:code id: tags:
+
+``` python
+
+```
--- a/docs/source/monomulti/result_analysis.rst
+++ b/docs/source/monomulti/result_analysis.rst
+Result alanysis module
+======================
+
+.. automodule:: multiview_platform.MonoMultiViewClassifiers.ResultAnalysis
+   :members:
+   :inherited-members:
\ No newline at end of file
--- a/docs/source/monomultidoc.rst
+++ b/docs/source/monomultidoc.rst
@@ -5,9 +5,7 @@ Mono and mutliview classification
   :maxdepth: 1
   :caption: Contents:

-   monomutli/metrics
-   monomutli/monoexec
-   monomutli/multiexec
-   monomutli/monoclf
-   monomutli/multiclf
-   monomutli/utils
\ No newline at end of file
+   monomulti/metrics
+   monomulti/exec_classif
+   monomulti/result_analysis
+   monomulti/multiview_classifier
\ No newline at end of file
--- a/multiview_platform/MonoMultiViewClassifiers/ExecClassif.py
+++ b/multiview_platform/MonoMultiViewClassifiers/ExecClassif.py
@@ -38,7 +38,7 @@ def initBenchmark(args):

        allMonoviewAlgos = [name for _, name, isPackage in
                            pkgutil.iter_modules(['./MonoMultiViewClassifiers/MonoviewClassifiers'])
-                            if (not isPackage)]
+                            if (not isPackage) and name not in ["framework"]]
        benchmark["Monoview"] = allMonoviewAlgos
        benchmark["Multiview"] = dict((multiviewPackageName, "_") for multiviewPackageName in allMultiviewPackages)
        for multiviewPackageName in allMultiviewPackages:
@@ -389,7 +389,7 @@ def execClassif(arguments):
    metrics = [metric.split(":") for metric in args.CL_metrics]
    if metrics == [[""]]:
        metricsNames = [name for _, name, isPackage
-                        in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["log_loss", "matthews_corrcoef", "roc_auc_score"]]
+                        in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["framework", "log_loss", "matthews_corrcoef", "roc_auc_score"]]
        metrics = [[metricName] for metricName in metricsNames]
        metrics = arangeMetrics(metrics, args.CL_metric_princ)
    for metricIndex, metric in enumerate(metrics):

--- a/multiview_platform/MonoMultiViewClassifiers/Metrics/__init__.py
+++ b/multiview_platform/MonoMultiViewClassifiers/Metrics/__init__.py
@@ -24,11 +24,9 @@ Define a getConfig function
 """

 import os
-modules = []
 for module in os.listdir(os.path.dirname(os.path.realpath(__file__))):
-    if module in ['__init__.py', 'framework.py'] or module[-3:] != '.py':
+    if module in ['__init__.py'] or module[-3:] != '.py':
        continue
    __import__(module[:-3], locals(), globals(), [], 1)
    pass
-del module
 del os
\ No newline at end of file
--- a/multiview_platform/MonoMultiViewClassifiers/Metrics/framework.py
+++ b/multiview_platform/MonoMultiViewClassifiers/Metrics/framework.py
+"""In ths file, we explain how to add a metric to the platform.
+
+In order to do that, on needs to add a file with the following functions
+which are mandatory for the metric to work with the platform.
+"""
+
+# Author-Info
+__author__ = "Baptiste Bauvin"
+__status__ = "Prototype"  # Production, Development, Prototype
+
+
+def score(y_true, y_pred, multiclass=False, **kwargs):
+    """Get the metric's score from the ground truth (``y_true``) and predictions (``y_pred``).
+
+   Parameters
+   ----------
+   y_true : array-like, shape = (n_samples,)
+            Target values (class labels).
+
+   y_pred : array-like, shape = (n_samples,)
+            Predicted target values (class labels).
+
+   multiclass : boolean (default=False)
+                Parameter specifying whether the target values are multiclass or not.
+
+   kwargs : dict
+            The arguments stored in this dictionary must be keyed by string of
+            integers as "0", .., etc and decrypted in the function
+
+   Returns
+   -------
+   score : float
+           Returns the score of the prediction.
+    """
+    score = 0.0
+    return score
+
+
+def get_scorer(**kwargs):
+    """Get the metric's scorer as in the sklearn.metrics package.
+
+   Parameters
+   ----------
+   kwargs : dict
+           The arguments stored in this dictionary must be keyed by string of
+           integers as "0", .., etc and decrypted in the function. These arguments
+           are a configuration of the metric.
+
+   Returns
+   -------
+   scorer : object
+           Callable object that returns a scalar score; greater is better. (cf sklearn.metrics.make_scorer)
+    """
+    scorer = None
+    return scorer
+
+
+def getConfig(**kwargs):
+    """Get the metric's configuration as a string.
+
+   Parameters
+   ----------
+   kwargs : dict
+           The arguments stored in this dictionary must be keyed by string of
+           integers as "0", .., etc and decrypted in the function. These arguments
+           are a configuration of the metric.
+
+   Returns
+   -------
+   configString : string
+           The string describing the metric's configuration.
+    """
+
+    configString = "This is a framework"
+    return configString