Skip to content
Snippets Groups Projects
Commit a4e4d897 authored by Baptiste Bauvin's avatar Baptiste Bauvin
Browse files

Did a lot of doc on sphinx

parent d429937e
No related branches found
No related tags found
No related merge requests found
......@@ -27,6 +27,8 @@
#
# needs_sphinx = '1.0'
add_module_names = False
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
......@@ -34,6 +36,8 @@ extensions = ['sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'numpydoc',
'nbsphinx',
'sphinx.ext.coverage',
'sphinx.ext.mathjax',
'sphinx.ext.ifconfig',
......
......@@ -11,7 +11,7 @@ Welcome to MultiviewPlatform's documentation!
:caption: Contents:
api
examples
.. examples
......
Classification execution module
===============================
.. automodule:: multiview_platform.MonoMultiViewClassifiers.ExecClassif
:members:
:inherited-members:
\ No newline at end of file
Welcome to the metrics documentation!
=============================================
Metrics framework
=================
.. automodule:: multiview_platform.Metrics.framework
.. automodule:: multiview_platform.MonoMultiViewClassifiers.Metrics.framework
:members:
:inherited-members:
\ No newline at end of file
%% Cell type:markdown id: tags:
# How to add a multiview classifier to the platform
%% Cell type:markdown id: tags:
## File addition
%% Cell type:markdown id: tags:
* In the `Code/MonoMultiViewClassifiers/MultiviewClassifiers` package, add a new package named after your multiview classifier (let's call it NMC for New Multiview Classifier).
* In this package (`Code/MonoMultiViewClassifiers/MultiviewClassifiers/NMC`), add a file called `NMCModule.py` and another one called `analyzeResults.py`. These will be the two files used by the platform to communicate with your implementation.
* You can now add either a package named after your classifier `NMCPackage` and paste your files in it or just add a file with the same name if it is enough.
%% Cell type:markdown id: tags:
## `NMCModule.py`
%% Cell type:markdown id: tags:
Here we will list all the necessary functions of the python module to allow the platform to use NMC
%% Cell type:markdown id: tags:
### The functions
%% Cell type:markdown id: tags:
#### `getArgs`
%% Cell type:markdown id: tags:
This function is used to multiple arguments dictionaries from one benchmark entry. It must return the `argumentsList` to which it must have add at least a dictionary containing all the necessary information to run NMC. You must add all general fields about the type of classifier and a field called `NMCKWARGS` (`<classifier_name>KWARGS`) conataining another dictionary with classifier-specific agruments (we assume here that NMC has two hyper-parameters : a set of weights and an integer)
%% Cell type:code id: tags:
``` python
arguments = {"CL_type":"NMC",
"views":["all", "the", "views", "names"],
"NB_VIEW":len(["all", "the", "views", "names"]),
"viewsIndices":["the indices", "of the", "views in", "the hdf5 file"],
"NB_CLASS": "the number of labels of the dataset",
"LABLELS_NAMES": ["the names of", "the labels used"],
"NMCKWARGS":{"weights":[],
"integer":42,
"nbViews":5}
}
```
%% Cell type:markdown id: tags:
To fill these fields, you can use the default values given in argument of the function :
%% Cell type:code id: tags:
``` python
def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
argumentsList = []
nbViews = len(views)
arguments = {"CL_type": "NMC",
"views": views,
"NB_VIEW": len(views),
"viewsIndices": viewsIndices,
"NB_CLASS": len(args.CL_classes),
"LABELS_NAMES": args.CL_classes,
"NMCKWARGS": {"weights":[],
"integer":42,
"nbViews":5}}
argumentsList.append(arguments)
return argumentsList
```
%% Cell type:markdown id: tags:
This function is also used to add the user-defined configuration for the classifier, but we will discuss it later
%% Cell type:markdown id: tags:
#### `genName`
%% Cell type:markdown id: tags:
This function is used to generate a short string describing the classifier using its configuration.
%% Cell type:code id: tags:
``` python
def genName(config):
return "NMF"
```
%% Cell type:markdown id: tags:
Some classifiers, like some late fusion classifiers will have more complicated `genName` functions that will need to summurize which monoview classifiers they use in a short string using the `config` argument that is exactly the dictionay called `"NMCKWARGS"` in the `getArgs` function
%% Cell type:markdown id: tags:
#### `getBenchmark`
%% Cell type:markdown id: tags:
This function is used to generate the `benchmark` argument of `getArgs`. It stores all the different configurations that will have to be tested (does not inculde hyper-parameters sets). For example for the Mumbo classifier, il will store the list of possible algorithms to use as weak leaners.
%% Cell type:code id: tags:
``` python
def getBenchmark(benchmark, args=None):
benchmark["Multiview"]["NMC"] = ["Some NMC cnfigurations"]
return benchmark
```
%% Cell type:markdown id: tags:
The `benchmark` argument is pre-generated with an entry for all the multiview classifiers so you just need to fill it with the different configurations
%% Cell type:markdown id: tags:
#### `genParamsSets`
%% Cell type:markdown id: tags:
This function is used to generate random hyper-parameters sets to allow a randomized search to estimate the best one. It works in pair with the `setParams` method implemented in the classifier's class so you need to keep in mind the order of the hyper-paramters you used here.
The `classificationKWARGS` argument is the `"NMCKWARGS"` entry seen earlier, and it is highly recommended to use the `randomState` object (which is described [here](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.RandomState.html)) to generate random numbers in order for the results to be reproductible
Assuming our NMC classifier has 2 HP, one weight vector for each view and one integer that can be between 1 and 100, the `genParamsSets` function will look like :
%% Cell type:code id: tags:
``` python
def genParamsSets(classificationKWARGS, randomState, nIter=1):
weightsVector = [randomState.random_sample(classificationKWARGS["nbViews"]) for _ in range(nIter)]
nomralizedWeights = [weights/np.sum(weights) for weights in weightsVector]
intsVector = list(randomState.randint(1,100,nIter))
paramsSets = [[normalizedWeight, integer] for normalizedWeight, interger in zip(normalizedWeights, intsVector)]
return paramsSets
```
%% Cell type:markdown id: tags:
### The `NMC` class
%% Cell type:markdown id: tags:
It has to be named after the classifier adding `Class` at the end of its name.
%% Cell type:code id: tags:
``` python
class NMCClass:
pass
```
%% Cell type:markdown id: tags:
#### `init` method
%% Cell type:markdown id: tags:
There is nothing specific to define in the `__init__` method, you just need to initialize the attributes of your classifier. The `kwargs` argument is the `NMCKWARGS` dictionary seen earlier. In our example, NMC uses two hyper parameters : weights and an int.
%% Cell type:code id: tags:
``` python
def __init__(self, randomState, NB_CORES=1, **kwargs):
if kwargs["weights"] == []:
self.weights = randomState.random_sample(classificationKWARGS["nbViews"])
else:
self.weights = kwargs["weights"]
self.weights /= np.sum(self.weights)
self.integer = kwargs["integer"]
```
%% Cell type:markdown id: tags:
#### `setParams` method
%% Cell type:markdown id: tags:
This method is used to tune your classifier with a set of hyper parameters. The set is a list ordered as in the `genParamsSets` function seen earlier. The input of the `setParams` method is a list of parameters in the right order.
%% Cell type:code id: tags:
``` python
def setParams(self, paramsSet):
self.weights = paramsSet[0]
self.integer = paramsSet[1]
```
%% Cell type:markdown id: tags:
#### `fit_hdf5` method
%% Cell type:markdown id: tags:
This method is generaly the same as `sklearn`'s `fit` method but uses as an input an HDF5 dataset in order to lower the memory usage of the whole platform.
* The `DATASET` object is an HDF5 dataset file containing all the views and labels.
* The `usedIndices` object is a `numpy` 1d-array containing the indices of the examples want to learn from.
* The `viewsIndices` object is a `numpy` 1d-array containing the indices of the views we want to learn from.
%% Cell type:code id: tags:
``` python
def fit_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
# Call the fit function of your own module
pass
```
%% Cell type:markdown id: tags:
#### `predict_hdf5` method
%% Cell type:markdown id: tags:
This method is used as an HDF5-compatible method similar to `sklearn`'s `predict` method. It has the same input than the `fit_hdf5` method but returns a 1d-array containing the labels of the asked examples (ordered as in `usedIndices`).
%% Cell type:code id: tags:
``` python
def predict_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
# Call the predict function of your own module
predictedLabels = None # Just to avoid any ipynb running error
return predictedLabels
```
%% Cell type:markdown id: tags:
Once you've added everything to the `NMCModule.py` file you are close to be able to run your algorithm on the platform, you just need to fill the `analyzeResults.py` file.
%% Cell type:markdown id: tags:
## `analyzeResults.py`
%% Cell type:markdown id: tags:
The `analyzeResults.py` file is a module used to get a specific result analysis for your classifier. You have, in order to run the platform, to add aunique function called `execute` that will run the analysis and return three different variables :
* `stringAnalysis` is a string that will be saved in a file to describe the classifier, its performance and may give some insights on the interpretation of it's way to classify.
* `imagesAnalysis` is a dictionary where you can store images (as values) to describe the classifier & co., the keys will be the images names.
* `metricsScores` is a dictionary where the values are lists containing train and test scores, and the keys are the metrics names. ( `metricsScores = {"accuracy_score":[0.99, 0.10]}` )
The `execute` function has as inputs :
* `classifier` is a classifier object from your classifiers class
* `trainLabels` are the labels predicted for the train set by the classifier
* `testLabels` are the labels predicted for the test set by the classifier
* `DATASET` is the HDF5 dataset object
* `classificationKWARGS` is the dictionary named `NMCKWARGS` earlier
* `classificationIndices` is a triplet containing the learning indices, the validation indices and the testIndices for multiclass classification
* `LABELS_DICTIONARY` is a dictionary containing a label as a key and it's name as a value
* `views` is the list of the views names used by the classifier
* `nbCores` is an `int` fixing the number of threads used by the platform
* `times` is a tuple containing the extraction time and the classification time
* `name` is the name ofthe database on which the plartform is running
* `KFolds` is an `sklearn` kfold object used for the cross-validation
* `hyperParamSearch` is the type of the hyper parameters optimization method
* `nIter` is the number of iterations of the hyper parameters method
* `metrics` is the list of the metrics and their arguments
* `viewsIndices` is 1d-array of the indices of the views used for classification
* `randomState` is a `numpy` RandomState object
* `labels` are the groud truth labels of the dataset
%% Cell type:markdown id: tags:
The basic function analyzing results for all the classifiers looks like :
%% Cell type:code id: tags:
``` python
from ... import Metrics
from ...utils.MultiviewResultAnalysis import printMetricScore, getMetricsScores
def execute(classifier, trainLabels,
testLabels, DATASET,
classificationKWARGS, classificationIndices,
LABELS_DICTIONARY, views, nbCores, times,
name, KFolds,
hyperParamSearch, nIter, metrics,
viewsIndices, randomState, labels):
CLASS_LABELS = labels
learningIndices, validationIndices, testIndicesMulticlass = classificationIndices
metricModule = getattr(Metrics, metrics[0][0])
if metrics[0][1] is not None:
metricKWARGS = dict((index, metricConfig) for index, metricConfig in enumerate(metrics[0][1]))
else:
metricKWARGS = {}
scoreOnTrain = metricModule.score(CLASS_LABELS[learningIndices], CLASS_LABELS[learningIndices], **metricKWARGS)
scoreOnTest = metricModule.score(CLASS_LABELS[validationIndices], testLabels, **metricKWARGS)
# To be modified to fit to your classifier
classifierConfigurationString = "with weights : "+ ", ".join(map(str, list(classifier.weights))) + ", and integer : "+str(classifier.integer)
# Modify the name of the classifier in these strings
stringAnalysis = "\t\tResult for Multiview classification with NMC "+ \
"\n\n" + metrics[0][0] + " :\n\t-On Train : " + str(scoreOnTrain) + "\n\t-On Test : " + str(
scoreOnTest) + \
"\n\nDataset info :\n\t-Database name : " + name + "\n\t-Labels : " + \
', '.join(LABELS_DICTIONARY.values()) + "\n\t-Views : " + ', '.join(views) + "\n\t-" + str(
KFolds.n_splits) + \
" folds\n\nClassification configuration : \n\t-Algorithm used : NMC " + classifierConfigurationString
metricsScores = getMetricsScores(metrics, trainLabels, testLabels,
validationIndices, learningIndices, labels)
stringAnalysis += printMetricScore(metricsScores, metrics)
imagesAnalysis = {}
return stringAnalysis, imagesAnalysis, metricsScores
```
%% Cell type:markdown id: tags:
Once you have done this, your classifier is ready to be used by the platform, but you can add some description about your classifier in the analyzeResults file.
%% Cell type:markdown id: tags:
## Adding arguments to avoid hyper parameter optimization
%% Cell type:markdown id: tags:
In order to be able to test a specific set of arguments on this platform, you need to add some lines in the argument parser located in the file `Code/MonoMultiViewClassifiers/utils/execution.py` in the `parseTheArgs` function. What you need to do is to add a group of arguments, allowing you to pass the hyper parameters in the command line :
%% Cell type:code id: tags:
``` python
groupNMC = parser.add_argument_group('New Multiview Classifier arguments')
groupNMC.add_argument('--NMC_weights', metavar='FLOAT', action='store', nargs="+",
help='Determine the weights of NMC', type=float,
default=[])
groupNMC.add_argument('--NMC_integer', metavar='INT', action='store',
help='Determine the integer of NMC', type=int,
default=42)
```
%% Cell type:markdown id: tags:
In order for the platform to use these arguments, you need to modify the `getArgs` function of the file `NMCModule.py`.
%% Cell type:code id: tags:
``` python
def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
argumentsList = []
nbViews = len(views)
arguments = {"CL_type": "NMC",
"views": views,
"NB_VIEW": len(views),
"viewsIndices": viewsIndices,
"NB_CLASS": len(args.CL_classes),
"LABELS_NAMES": args.CL_classes,
"NMCKWARGS": {"weights":args.NMC_weights, # Modified to take the args into account
"integer":args.NMC_integer, # Modified to take the args into account
"nbViews":5}}
argumentsList.append(arguments)
return argumentsList
```
%% Cell type:markdown id: tags:
# Randomized example selection for classification
%% Cell type:markdown id: tags:
## Train/test split generation
%% Cell type:markdown id: tags:
The train/test splits are generated in the `execution.genSplits` function. It's task is to generate an train/test split for each statistical itetarion. In order to do that, it is fed by the following inputs
* `labels` are the data labels for all the dataset,
* `splitRatio` is a real number giving the ratio |test|/|all|,
* `statsIterRandomStates` is a list of `numpy` random states used to generate reproductible pseudo-random numbers
The main operation in this function is done by the `sklearn.model_selection.StratifiedShuffleSplit` function which returns folds that are made by preserving the percentage of samples for each class.
In this case we askittosplit the dataset in two subsets with the asked test size. It then returns a shuffled train/test split while preserving the percentage of samples for each class.
We store the examples indices in two `np.array`s called `trainIndices` and `testIndices`
All the algortihms will then train (hyper-parameters cross-validation & learning) on the trainIndices.
%% Cell type:code id: tags:
``` python
def genSplits(labels, splitRatio, statsIterRandomStates):
"""Used to gen the train/test splits using one or multiple random states
classificationIndices is a list of train/test splits"""
indices = np.arange(len(labels))
splits = []
for randomState in statsIterRandomStates:
foldsObj = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1,
random_state=randomState,
test_size=splitRatio)
folds = foldsObj.split(indices, labels)
for fold in folds:
train_fold, test_fold = fold
trainIndices = indices[train_fold]
testIndices = indices[test_fold]
splits.append([trainIndices, testIndices])
return splits
```
%% Cell type:markdown id: tags:
## Multiclass problems
%% Cell type:markdown id: tags:
To be able to use the platform on multiclass problems, a one-versus-one method is implemented.
In orderto use one-versus-one we need to modify the train/test splits generated by the previouslydescribed founction.
If the problem the platformis asked to resolve is multiclass then, it will generate all the possible two-class combinations to divide the main problem in multiple biclass ones.
In order to adapt each split, the `genMulticlassLabels` function will create new train/test splits by :
* Generating an `oldIndices` list containing all the examples indices that have their label in the combination
* Generate a new train split by selecting only the indices of `trainIndices` that have their labels in the combination.
* Do the samething for the test indices
* Copy the old `testIndices` variable in a new one called `testIndicesMulticlass` that will be used to predict on the entire dataset once the algorithm has learn to distinguish two classes
* Generate a new `label` array by replacing all the labels that are not in the combination by -100 to flag them as unseen labels during the training phase.
Then the function will return a triplet :
* `multiclassLabels` is a list containing, for each combination, the newly generated labels with ones and zeros for each of the labels in the combination and -100 for the others.
* `labelsIndices` is a list contaningall the combinations,
* `indicesMulticlass` is a list containig triplets for each statistical iteration :
* `trainIndices` are the indices used for training that were picked only in the two classes of the combination (at the second step of the previous list),
* `testIndices` are the indices used for testing the biclass-generalization capacity of each biclass classifier learned on `trainIndices` that were picked only in the two classes of the combination (at the third step of the previous list),
* `tesIndicesMulticlass` are the indices described at the fourth setp of the previous list.
%% Cell type:markdown id: tags:
## Cross-validation folds
%% Cell type:markdown id: tags:
The cross validation folds are generated using a `StratifiedKFold` object, for the `sklearn.model_selection` library.
* For all the **monoview** algorithms, these objects (one for each statistical iteration) are then fed in a `sklearn.model_selection` `RandomisedSearchCV` object. So we don't have any custom stuff about cross-vaildation folds in the monoview case
* In the **multiview** case, they are used in the `utils.HyperParametersSearch` module, in the `randomizedSearch` function. In this case, they are used to split the learning set with `multiviewFolds = KFolds.split(learningIndices, labels[learningIndices])` and then used in `for trainIndices, testIndices in multiviewFolds:`
%% Cell type:code id: tags:
``` python
```
Result alanysis module
======================
.. automodule:: multiview_platform.MonoMultiViewClassifiers.ResultAnalysis
:members:
:inherited-members:
\ No newline at end of file
......@@ -5,9 +5,7 @@ Mono and mutliview classification
:maxdepth: 1
:caption: Contents:
monomutli/metrics
monomutli/monoexec
monomutli/multiexec
monomutli/monoclf
monomutli/multiclf
monomutli/utils
\ No newline at end of file
monomulti/metrics
monomulti/exec_classif
monomulti/result_analysis
monomulti/multiview_classifier
\ No newline at end of file
......@@ -38,7 +38,7 @@ def initBenchmark(args):
allMonoviewAlgos = [name for _, name, isPackage in
pkgutil.iter_modules(['./MonoMultiViewClassifiers/MonoviewClassifiers'])
if (not isPackage)]
if (not isPackage) and name not in ["framework"]]
benchmark["Monoview"] = allMonoviewAlgos
benchmark["Multiview"] = dict((multiviewPackageName, "_") for multiviewPackageName in allMultiviewPackages)
for multiviewPackageName in allMultiviewPackages:
......@@ -389,7 +389,7 @@ def execClassif(arguments):
metrics = [metric.split(":") for metric in args.CL_metrics]
if metrics == [[""]]:
metricsNames = [name for _, name, isPackage
in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["log_loss", "matthews_corrcoef", "roc_auc_score"]]
in pkgutil.iter_modules(['./MonoMultiViewClassifiers/Metrics']) if not isPackage and name not in ["framework", "log_loss", "matthews_corrcoef", "roc_auc_score"]]
metrics = [[metricName] for metricName in metricsNames]
metrics = arangeMetrics(metrics, args.CL_metric_princ)
for metricIndex, metric in enumerate(metrics):
......
......@@ -24,11 +24,9 @@ Define a getConfig function
"""
import os
modules = []
for module in os.listdir(os.path.dirname(os.path.realpath(__file__))):
if module in ['__init__.py', 'framework.py'] or module[-3:] != '.py':
if module in ['__init__.py'] or module[-3:] != '.py':
continue
__import__(module[:-3], locals(), globals(), [], 1)
pass
del module
del os
\ No newline at end of file
"""In ths file, we explain how to add a metric to the platform.
In order to do that, on needs to add a file with the following functions
which are mandatory for the metric to work with the platform.
"""
# Author-Info
__author__ = "Baptiste Bauvin"
__status__ = "Prototype" # Production, Development, Prototype
def score(y_true, y_pred, multiclass=False, **kwargs):
"""Get the metric's score from the ground truth (``y_true``) and predictions (``y_pred``).
Parameters
----------
y_true : array-like, shape = (n_samples,)
Target values (class labels).
y_pred : array-like, shape = (n_samples,)
Predicted target values (class labels).
multiclass : boolean (default=False)
Parameter specifying whether the target values are multiclass or not.
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function
Returns
-------
score : float
Returns the score of the prediction.
"""
score = 0.0
return score
def get_scorer(**kwargs):
"""Get the metric's scorer as in the sklearn.metrics package.
Parameters
----------
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function. These arguments
are a configuration of the metric.
Returns
-------
scorer : object
Callable object that returns a scalar score; greater is better. (cf sklearn.metrics.make_scorer)
"""
scorer = None
return scorer
def getConfig(**kwargs):
"""Get the metric's configuration as a string.
Parameters
----------
kwargs : dict
The arguments stored in this dictionary must be keyed by string of
integers as "0", .., etc and decrypted in the function. These arguments
are a configuration of the metric.
Returns
-------
configString : string
The string describing the metric's configuration.
"""
configString = "This is a framework"
return configString
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment