Skip to content
Snippets Groups Projects
Commit a4e4d897 authored by Baptiste Bauvin's avatar Baptiste Bauvin
Browse files

Did a lot of doc on sphinx

parent d429937e
Branches
Tags
No related merge requests found
......@@ -27,6 +27,8 @@
#
# needs_sphinx = '1.0'
add_module_names = False
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
......@@ -34,6 +36,8 @@ extensions = ['sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'numpydoc',
'nbsphinx',
'sphinx.ext.coverage',
'sphinx.ext.mathjax',
'sphinx.ext.ifconfig',
......
......@@ -11,7 +11,7 @@ Welcome to MultiviewPlatform's documentation!
:caption: Contents:
api
examples
.. examples
......
Classification execution module
===============================
.. automodule:: multiview_platform.MonoMultiViewClassifiers.ExecClassif
:members:
:inherited-members:
\ No newline at end of file
Welcome to the metrics documentation!
=============================================
Metrics framework
=================
.. automodule:: multiview_platform.Metrics.framework
.. automodule:: multiview_platform.MonoMultiViewClassifiers.Metrics.framework
:members:
:inherited-members:
\ No newline at end of file
This diff is collapsed.
%% Cell type:markdown id: tags:
# Randomized example selection for classification
%% Cell type:markdown id: tags:
## Train/test split generation
%% Cell type:markdown id: tags:
The train/test splits are generated in the `execution.genSplits` function. It's task is to generate an train/test split for each statistical itetarion. In order to do that, it is fed by the following inputs
* `labels` are the data labels for all the dataset,
* `splitRatio` is a real number giving the ratio |test|/|all|,
* `statsIterRandomStates` is a list of `numpy` random states used to generate reproductible pseudo-random numbers
The main operation in this function is done by the `sklearn.model_selection.StratifiedShuffleSplit` function which returns folds that are made by preserving the percentage of samples for each class.
In this case we askittosplit the dataset in two subsets with the asked test size. It then returns a shuffled train/test split while preserving the percentage of samples for each class.
We store the examples indices in two `np.array`s called `trainIndices` and `testIndices`
All the algortihms will then train (hyper-parameters cross-validation & learning) on the trainIndices.
%% Cell type:code id: tags:
``` python
def genSplits(labels, splitRatio, statsIterRandomStates):
"""Used to gen the train/test splits using one or multiple random states
classificationIndices is a list of train/test splits"""
indices = np.arange(len(labels))
splits = []
for randomState in statsIterRandomStates:
foldsObj = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1,
random_state=randomState,
test_size=splitRatio)
folds = foldsObj.split(indices, labels)
for fold in folds:
train_fold, test_fold = fold
trainIndices = indices[train_fold]
testIndices = indices[test_fold]
splits.append([trainIndices, testIndices])
return splits
```
%% Cell type:markdown id: tags:
## Multiclass problems
%% Cell type:markdown id: tags:
To be able to use the platform on multiclass problems, a one-versus-one method is implemented.
In orderto use one-versus-one we need to modify the train/test splits generated by the previouslydescribed founction.
If the problem the platformis asked to resolve is multiclass then, it will generate all the possible two-class combinations to divide the main problem in multiple biclass ones.
In order to adapt each split, the `genMulticlassLabels` function will create new train/test splits by :
* Generating an `oldIndices` list containing all the examples indices that have their label in the combination
* Generate a new train split by selecting only the indices of `trainIndices` that have their labels in the combination.
* Do the samething for the test indices
* Copy the old `testIndices` variable in a new one called `testIndicesMulticlass` that will be used to predict on the entire dataset once the algorithm has learn to distinguish two classes
* Generate a new `label` array by replacing all the labels that are not in the combination by -100 to flag them as unseen labels during the training phase.
Then the function will return a triplet :
* `multiclassLabels` is a list containing, for each combination, the newly generated labels with ones and zeros for each of the labels in the combination and -100 for the others.
* `labelsIndices` is a list contaningall the combinations,
* `indicesMulticlass` is a list containig triplets for each statistical iteration :
* `trainIndices` are the indices used for training that were picked only in the two classes of the combination (at the second step of the previous list),
* `testIndices` are the indices used for testing the biclass-generalization capacity of each biclass classifier learned on `trainIndices` that were picked only in the two classes of the combination (at the third step of the previous list),
* `tesIndicesMulticlass` are the indices described at the fourth setp of the previous list.
%% Cell type:markdown id: tags:
## Cross-validation folds
%% Cell type:markdown id: tags:
The cross validation folds are generated using a `StratifiedKFold` object, for the `sklearn.model_selection` library.
* For all the **monoview** algorithms, these objects (one for each statistical iteration) are then fed in a `sklearn.model_selection` `RandomisedSearchCV` object. So we don't have any custom stuff about cross-vaildation folds in the monoview case
* In the **multiview** case, they are used in the `utils.HyperParametersSearch` module, in the `randomizedSearch` function. In this case, they are used to split the learning set with `multiviewFolds = KFolds.split(learningIndices, labels[learningIndices])` and then used in `for trainIndices, testIndices in multiviewFolds:`
%% Cell type:code id: tags:
``` python
```
Result alanysis module
======================
.. automodule:: multiview_platform.MonoMultiViewClassifiers.ResultAnalysis
:members:
:inherited-members:
\ No newline at end of file
......@@ -5,9 +5,7 @@ Mono and mutliview classification
:maxdepth: 1
:caption: Contents:
monomutli/metrics
monomutli/monoexec
monomutli/multiexec
monomutli/monoclf
monomutli/multiclf
monomutli/utils
\ No newline at end of file
monomulti/metrics
monomulti/exec_classif
monomulti/result_analysis
monomulti/multiview_classifier
\ No newline at end of file
......@@ -38,7 +38,7 @@ def initBenchmark(args):
allMonoviewAlgos = [name for _, name, isPackage in
pkgutil.iter_modules(['./MonoMultiViewClassifiers/MonoviewClassifiers'])
if (not isPackage)]
if (not isPackage) and name not in ["framework"]]
benchmark["Monoview"] = allMonoviewAlgos
benchmark["Multiview"] = dict((multiviewPackageName, "_") for multiviewPackageName in allMultiviewPackages)
for multiviewPackageName in allMultiviewPackages:
......@@ -389,7 +389,7 @@ def execClassif(arguments):
metrics = [metric.split(":") for metric in args.CL_metrics]
if metrics == [[""]]:
metricsNames = [name for _, name, isPackage
in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["log_loss", "matthews_corrcoef", "roc_auc_score"]]
in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["framework", "log_loss", "matthews_corrcoef", "roc_auc_score"]]
metrics = [[metricName] for metricName in metricsNames]
metrics = arangeMetrics(metrics, args.CL_metric_princ)
for metricIndex, metric in enumerate(metrics):
......
......@@ -24,11 +24,9 @@ Define a getConfig function
"""
import os
modules = []
for module in os.listdir(os.path.dirname(os.path.realpath(__file__))):
if module in ['__init__.py', 'framework.py'] or module[-3:] != '.py':
if module in ['__init__.py'] or module[-3:] != '.py':
continue
__import__(module[:-3], locals(), globals(), [], 1)
pass
del module
del os
\ No newline at end of file
"""In ths file, we explain how to add a metric to the platform.
In order to do that, on needs to add a file with the following functions
which are mandatory for the metric to work with the platform.
"""
# Author-Info
__author__ = "Baptiste Bauvin"
__status__ = "Prototype" # Production, Development, Prototype
def score(y_true, y_pred, multiclass=False, **kwargs):
"""Get the metric's score from the ground truth (``y_true``) and predictions (``y_pred``).
Parameters
----------
y_true : array-like, shape = (n_samples,)
Target values (class labels).
y_pred : array-like, shape = (n_samples,)
Predicted target values (class labels).
multiclass : boolean (default=False)
Parameter specifying whether the target values are multiclass or not.
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function
Returns
-------
score : float
Returns the score of the prediction.
"""
score = 0.0
return score
def get_scorer(**kwargs):
"""Get the metric's scorer as in the sklearn.metrics package.
Parameters
----------
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function. These arguments
are a configuration of the metric.
Returns
-------
scorer : object
Callable object that returns a scalar score; greater is better. (cf sklearn.metrics.make_scorer)
"""
scorer = None
return scorer
def getConfig(**kwargs):
"""Get the metric's configuration as a string.
Parameters
----------
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function. These arguments
are a configuration of the metric.
Returns
-------
configString : string
The string describing the metric's configuration.
"""
configString = "This is a framework"
return configString
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment