arguments = {"CL_type":"NMC",
"views":["all", "the", "views", "names"],
"NB_VIEW":len(["all", "the", "views", "names"]),
"viewsIndices":["the indices", "of the", "views in", "the hdf5 file"],
"NB_CLASS": "the number of labels of the dataset",
"LABLELS_NAMES": ["the names of", "the labels used"],
"NMCKWARGS":{"weights":[],
"integer":42,
"nbViews":5}
}
QarBoost.py
How to add a multiview classifier to the platform
File addition
In the
Code/MonoMultiViewClassifiers/MultiviewClassifiers
package, add a new package named after your multiview classifier (let's call it NMC for New Multiview Classifier).In this package (
Code/MonoMultiViewClassifiers/MultiviewClassifiers/NMC
), add a file calledNMCModule.py
and another one calledanalyzeResults.py
. These will be the two files used by the platform to communicate with your implementation.You can now add either a package named after your classifier
NMCPackage
and paste your files in it or just add a file with the same name if it is enough.
NMCModule.py
Here we will list all the necessary functions of the python module to allow the platform to use NMC
The functions
getArgs
This function is used to multiple arguments dictionaries from one benchmark entry. It must return the argumentsList
to which it must have add at least a dictionary containing all the necessary information to run NMC. You must add all general fields about the type of classifier and a field called NMCKWARGS
(<classifier_name>KWARGS
) conataining another dictionary with classifier-specific agruments (we assume here that NMC has two hyper-parameters : a set of weights and an integer)
To fill these fields, you can use the default values given in argument of the function :
def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
argumentsList = []
nbViews = len(views)
arguments = {"CL_type": "NMC",
"views": views,
"NB_VIEW": len(views),
"viewsIndices": viewsIndices,
"NB_CLASS": len(args.CL_classes),
"LABELS_NAMES": args.CL_classes,
"NMCKWARGS": {"weights":[],
"integer":42,
"nbViews":5}}
argumentsList.append(arguments)
return argumentsList
This function is also used to add the user-defined configuration for the classifier, but we will discuss it later
genName
This function is used to generate a short string describing the classifier using its configuration.
def genName(config):
return "NMF"
Some classifiers, like some late fusion classifiers will have more complicated genName
functions that will need to summurize which monoview classifiers they use in a short string using the config
argument that is exactly the dictionay called "NMCKWARGS"
in the getArgs
function
getBenchmark
This function is used to generate the benchmark
argument of getArgs
. It stores all the different configurations that will have to be tested (does not inculde hyper-parameters sets). For example for the Mumbo classifier, il will store the list of possible algorithms to use as weak leaners.
def getBenchmark(benchmark, args=None):
benchmark["Multiview"]["NMC"] = ["Some NMC cnfigurations"]
return benchmark
The benchmark
argument is pre-generated with an entry for all the multiview classifiers so you just need to fill it with the different configurations
genParamsSets
This function is used to generate random hyper-parameters sets to allow a randomized search to estimate the best one. It works in pair with the setParams
method implemented in the classifier's class so you need to keep in mind the order of the hyper-paramters you used here.
The classificationKWARGS
argument is the "NMCKWARGS"
entry seen earlier, and it is highly recommended to use the randomState
object (which is described here) to generate random numbers in order for the results to be reproductible
Assuming our NMC classifier has 2 HP, one weight vector for each view and one integer that can be between 1 and 100, the genParamsSets
function will look like :
def genParamsSets(classificationKWARGS, randomState, nIter=1):
weightsVector = [randomState.random_sample(classificationKWARGS["nbViews"]) for _ in range(nIter)]
nomralizedWeights = [weights/np.sum(weights) for weights in weightsVector]
intsVector = list(randomState.randint(1,100,nIter))
paramsSets = [[normalizedWeight, integer] for normalizedWeight, interger in zip(normalizedWeights, intsVector)]
return paramsSets
The NMC
class
It has to be named after the classifier adding Class
at the end of its name.
class NMCClass:
pass
init
method
There is nothing specific to define in the __init__
method, you just need to initialize the attributes of your classifier. The kwargs
argument is the NMCKWARGS
dictionary seen earlier. In our example, NMC uses two hyper parameters : weights and an int.
def __init__(self, randomState, NB_CORES=1, **kwargs):
if kwargs["weights"] == []:
self.weights = randomState.random_sample(classificationKWARGS["nbViews"])
else:
self.weights = kwargs["weights"]
self.weights /= np.sum(self.weights)
self.integer = kwargs["integer"]
setParams
method
This method is used to tune your classifier with a set of hyper parameters. The set is a list ordered as in the genParamsSets
function seen earlier. The input of the setParams
method is a list of parameters in the right order.
def setParams(self, paramsSet):
self.weights = paramsSet[0]
self.integer = paramsSet[1]
fit_hdf5
method
This method is generaly the same as sklearn
's fit
method but uses as an input an HDF5 dataset in order to lower the memory usage of the whole platform.
- The
DATASET
object is an HDF5 dataset file containing all the views and labels. - The
usedIndices
object is anumpy
1d-array containing the indices of the examples want to learn from. - The
viewsIndices
object is anumpy
1d-array containing the indices of the views we want to learn from.
def fit_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
# Call the fit function of your own module
pass
predict_hdf5
method
This method is used as an HDF5-compatible method similar to sklearn
's predict
method. It has the same input than the fit_hdf5
method but returns a 1d-array containing the labels of the asked examples (ordered as in usedIndices
).
def predict_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
# Call the predict function of your own module
predictedLabels = None # Just to avoid any ipynb running error
return predictedLabels
Once you've added everything to the NMCModule.py
file you are close to be able to run your algorithm on the platform, you just need to fill the analyzeResults.py
file.
analyzeResults.py
The analyzeResults.py
file is a module used to get a specific result analysis for your classifier. You have, in order to run the platform, to add aunique function called execute
that will run the analysis and return three different variables :
stringAnalysis
is a string that will be saved in a file to describe the classifier, its performance and may give some insights on the interpretation of it's way to classify.imagesAnalysis
is a dictionary where you can store images (as values) to describe the classifier & co., the keys will be the images names.metricsScores
is a dictionary where the values are lists containing train and test scores, and the keys are the metrics names. (metricsScores = {"accuracy_score":[0.99, 0.10]}
) Theexecute
function has as inputs :classifier
is a classifier object from your classifiers classtrainLabels
are the labels predicted for the train set by the classifiertestLabels
are the labels predicted for the test set by the classifierDATASET
is the HDF5 dataset objectclassificationKWARGS
is the dictionary namedNMCKWARGS
earlierclassificationIndices
is a triplet containing the learning indices, the validation indices and the testIndices for multiclass classificationLABELS_DICTIONARY
is a dictionary containing a label as a key and it's name as a valueviews
is the list of the views names used by the classifiernbCores
is anint
fixing the number of threads used by the platformtimes
is a tuple containing the extraction time and the classification timename
is the name ofthe database on which the plartform is runningKFolds
is ansklearn
kfold object used for the cross-validationhyperParamSearch
is the type of the hyper parameters optimization methodnIter
is the number of iterations of the hyper parameters methodmetrics
is the list of the metrics and their argumentsviewsIndices
is 1d-array of the indices of the views used for classificationrandomState
is anumpy
RandomState objectlabels
are the groud truth labels of the dataset
The basic function analyzing results for all the classifiers looks like :
from ... import Metrics
from ...utils.MultiviewResultAnalysis import printMetricScore, getMetricsScores
def execute(classifier, trainLabels,
testLabels, DATASET,
classificationKWARGS, classificationIndices,
LABELS_DICTIONARY, views, nbCores, times,
name, KFolds,
hyperParamSearch, nIter, metrics,
viewsIndices, randomState, labels):
CLASS_LABELS = labels
learningIndices, validationIndices, testIndicesMulticlass = classificationIndices
metricModule = getattr(Metrics, metrics[0][0])
if metrics[0][1] is not None:
metricKWARGS = dict((index, metricConfig) for index, metricConfig in enumerate(metrics[0][1]))
else:
metricKWARGS = {}
scoreOnTrain = metricModule.score(CLASS_LABELS[learningIndices], CLASS_LABELS[learningIndices], **metricKWARGS)
scoreOnTest = metricModule.score(CLASS_LABELS[validationIndices], testLabels, **metricKWARGS)
# To be modified to fit to your classifier
classifierConfigurationString = "with weights : "+ ", ".join(map(str, list(classifier.weights))) + ", and integer : "+str(classifier.integer)
# Modify the name of the classifier in these strings
stringAnalysis = "\t\tResult for Multiview classification with NMC "+ \
"\n\n" + metrics[0][0] + " :\n\t-On Train : " + str(scoreOnTrain) + "\n\t-On Test : " + str(
scoreOnTest) + \
"\n\nDataset info :\n\t-Database name : " + name + "\n\t-Labels : " + \
', '.join(LABELS_DICTIONARY.values()) + "\n\t-Views : " + ', '.join(views) + "\n\t-" + str(
KFolds.n_splits) + \
" folds\n\nClassification configuration : \n\t-Algorithm used : NMC " + classifierConfigurationString
metricsScores = getMetricsScores(metrics, trainLabels, testLabels,
validationIndices, learningIndices, labels)
stringAnalysis += printMetricScore(metricsScores, metrics)
imagesAnalysis = {}
return stringAnalysis, imagesAnalysis, metricsScores
Once you have done this, your classifier is ready to be used by the platform, but you can add some description about your classifier in the analyzeResults file.
Adding arguments to avoid hyper parameter optimization
In order to be able to test a specific set of arguments on this platform, you need to add some lines in the argument parser located in the file Code/MonoMultiViewClassifiers/utils/execution.py
in the parseTheArgs
function. What you need to do is to add a group of arguments, allowing you to pass the hyper parameters in the command line :
groupNMC = parser.add_argument_group('New Multiview Classifier arguments')
groupNMC.add_argument('--NMC_weights', metavar='FLOAT', action='store', nargs="+",
help='Determine the weights of NMC', type=float,
default=[])
groupNMC.add_argument('--NMC_integer', metavar='INT', action='store',
help='Determine the integer of NMC', type=int,
default=42)
In order for the platform to use these arguments, you need to modify the getArgs
function of the file NMCModule.py
.
def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
argumentsList = []
nbViews = len(views)
arguments = {"CL_type": "NMC",
"views": views,
"NB_VIEW": len(views),
"viewsIndices": viewsIndices,
"NB_CLASS": len(args.CL_classes),
"LABELS_NAMES": args.CL_classes,
"NMCKWARGS": {"weights":args.NMC_weights, # Modified to take the args into account
"integer":args.NMC_integer, # Modified to take the args into account
"nbViews":5}}
argumentsList.append(arguments)
return argumentsList