Adding a multiview classifier.ipynb

 
 How to add a multiview classifier to the platform
 File addition
 
In the Code/MonoMultiViewClassifiers/MultiviewClassifiers package, add a new package named after your multiview classifier (let's call it NMC for New Multiview Classifier).
In this package (Code/MonoMultiViewClassifiers/MultiviewClassifiers/NMC), add a file called NMCModule.py and another one called analyzeResults.py. These will be the two files used by the platform to communicate with your implementation.
You can now add either a package named after your classifier NMCPackage and paste your files in it or just add a file with the same name if it is enough.
 NMCModule.py
 Here we will list all the necessary functions of the python module to allow the platform to use NMC
 The functions
 getArgs
 This function is used to multiple arguments dictionaries from one benchmark entry. It must return the argumentsList to which it must have add at least a dictionary containing all the necessary information to run NMC. You must add all general fields about the type of classifier and a field called NMCKWARGS (<classifier_name>KWARGS) conataining another dictionary with classifier-specific agruments (we assume here that NMC has two hyper-parameters : a set of weights and an integer) 
 In [2]: 
 arguments = {"CL_type":"NMC", 
            "views":["all", "the", "views", "names"],
            "NB_VIEW":len(["all", "the", "views", "names"]), 
            "viewsIndices":["the indices", "of the", "views in", "the hdf5 file"], 
            "NB_CLASS": "the number of labels of the dataset", 
            "LABLELS_NAMES": ["the names of", "the labels used"], 
            "NMCKWARGS":{"weights":[], 
                         "integer":42,
                         "nbViews":5}
            }
 
 To fill these fields, you can use the default values given in argument of the function : 
 def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
    argumentsList = []
    nbViews = len(views)
    arguments = {"CL_type": "NMC",
                 "views": views,
                 "NB_VIEW": len(views),
                 "viewsIndices": viewsIndices,
                 "NB_CLASS": len(args.CL_classes),
                 "LABELS_NAMES": args.CL_classes,
                 "NMCKWARGS": {"weights":[],
                               "integer":42,
                              "nbViews":5}}
    argumentsList.append(arguments)
    return argumentsList
 
 This function is also used to add the user-defined configuration for the classifier, but we will discuss it later
 genName
 This function is used to generate a short string describing the classifier using its configuration.
 In [3]: 
 def genName(config):
    return "NMF"
 
 Some classifiers, like some late fusion classifiers will have more complicated genName functions that will need to summurize which monoview classifiers they use in a short string using the config argument that is exactly the dictionay called "NMCKWARGS" in the getArgs function
 getBenchmark
 This function is used to generate the benchmark argument of getArgs. It stores all the different configurations that will have to be tested (does not inculde hyper-parameters sets). For example for the Mumbo classifier, il will store the list of possible algorithms to use as weak leaners.  
 In [4]: 
 def getBenchmark(benchmark, args=None):
    benchmark["Multiview"]["NMC"] = ["Some NMC cnfigurations"]
    return benchmark
 
 The benchmark argument is pre-generated with an entry for all the multiview classifiers so you just need to fill it with the different configurations
 genParamsSets
 This function is used to generate random hyper-parameters sets to allow a randomized search to estimate the best one. It works in pair with the setParams method implemented in the classifier's class so you need to keep in mind the order of the hyper-paramters you used here.
The classificationKWARGS argument is the "NMCKWARGS" entry seen earlier, and it is highly recommended to use the randomState object (which is described here) to generate random numbers in order for the results to be reproductible
Assuming our NMC classifier has 2 HP, one weight vector for each view and one integer that can be between 1 and 100, the genParamsSets function will look like :
 In [5]: 
 def genParamsSets(classificationKWARGS, randomState, nIter=1):
    weightsVector = [randomState.random_sample(classificationKWARGS["nbViews"]) for _ in range(nIter)]
    nomralizedWeights = [weights/np.sum(weights) for weights in weightsVector]
    intsVector = list(randomState.randint(1,100,nIter))
    paramsSets = [[normalizedWeight, integer] for normalizedWeight, interger in zip(normalizedWeights, intsVector)]
    return paramsSets
 
 The NMC class
 It has to be named after the classifier adding Class at the end of its name. 
 In [6]: 
 class NMCClass:
    pass
 
 init method
 There is nothing specific to define in the __init__ method, you just need to initialize the attributes of your classifier. The kwargs argument is the NMCKWARGS dictionary seen earlier. In our example, NMC uses two hyper parameters : weights and an int.
 In [2]: 
 def __init__(self, randomState, NB_CORES=1, **kwargs):
    if kwargs["weights"] == []:
        self.weights = randomState.random_sample(classificationKWARGS["nbViews"])
    else:
        self.weights = kwargs["weights"]
    self.weights /= np.sum(self.weights)
    self.integer = kwargs["integer"]
 
 setParams method
 This method is used to tune your classifier with a set of hyper parameters. The set is a list ordered as in the genParamsSets function seen earlier. The input of the setParams method is a list of parameters in the right order.  
 def setParams(self, paramsSet):
    self.weights = paramsSet[0]
    self.integer = paramsSet[1]
 
 fit_hdf5 method
 This method is generaly the same as sklearn's fit method but uses as an input an HDF5 dataset in order to lower the memory usage of the whole platform.

The DATASET object is an HDF5 dataset file containing all the views and labels. 
The usedIndices object is a numpy 1d-array containing the indices of the examples want to learn from. 
The viewsIndices object is a numpy 1d-array containing the indices of the views we want to learn from.
 In [8]: 
 def fit_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
    # Call the fit function of your own module
    pass
 
 predict_hdf5 method
 This method is used as an HDF5-compatible method similar to sklearn's predict method. It has the same input than the fit_hdf5 method but returns a 1d-array containing the labels of the asked examples (ordered as in usedIndices).
 In [11]: 
 def predict_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
    # Call the predict function of your own module
    predictedLabels = None # Just to avoid any ipynb running error
    return predictedLabels
 
 Once you've added everything to the NMCModule.py file you are close to be able to run your algorithm on the platform, you just need to fill the analyzeResults.py file.
 analyzeResults.py
 The analyzeResults.py file is a module used to get a specific result analysis for your classifier. You have, in order to run the platform, to add aunique function called execute that will run the analysis and return three different variables : 

stringAnalysis is a string that will be saved in a file to describe the classifier, its performance and may give some insights on the interpretation of it's way to classify. 
imagesAnalysis is a dictionary where you can store images (as values) to describe the classifier & co., the keys will be the images names.   
metricsScores is a dictionary where the values are lists containing train and test scores, and the keys are the metrics names. ( metricsScores = {"accuracy_score":[0.99, 0.10]} )
The execute function has as inputs : 
classifier is a classifier object from your classifiers class
trainLabels are the labels predicted for the train set by the classifier
testLabels are the labels predicted for the test set by the classifier
DATASET is the HDF5 dataset object
classificationKWARGS is the dictionary named NMCKWARGS earlier
classificationIndices is a triplet containing the learning indices, the validation indices and the testIndices for multiclass classification
LABELS_DICTIONARY is a dictionary containing a label as a key and it's name as a value
views is the list of the views names used by the classifier
nbCores is an int fixing the number of threads used by the platform 
times is a tuple containing the extraction time and the classification time
name is the name ofthe database on which the plartform is running
KFolds is an sklearn kfold object used for the cross-validation
hyperParamSearch is the type of the hyper parameters optimization method
nIter is the number of iterations of the hyper parameters method
metrics is the list of the metrics and their arguments
viewsIndices is 1d-array of the indices of the views used for classification
randomState is a numpy RandomState object
labels are the groud truth labels of the dataset
 The basic function analyzing results for all the classifiers looks like : 
 from ... import Metrics
from ...utils.MultiviewResultAnalysis import printMetricScore, getMetricsScores

def execute(classifier, trainLabels,
            testLabels, DATASET,
            classificationKWARGS, classificationIndices,
            LABELS_DICTIONARY, views, nbCores, times,
            name, KFolds,
            hyperParamSearch, nIter, metrics,
            viewsIndices, randomState, labels):
    CLASS_LABELS = labels
    learningIndices, validationIndices, testIndicesMulticlass = classificationIndices

    metricModule = getattr(Metrics, metrics[0][0])
    if metrics[0][1] is not None:
        metricKWARGS = dict((index, metricConfig) for index, metricConfig in enumerate(metrics[0][1]))
    else:
        metricKWARGS = {}
    scoreOnTrain = metricModule.score(CLASS_LABELS[learningIndices], CLASS_LABELS[learningIndices], **metricKWARGS)
    scoreOnTest = metricModule.score(CLASS_LABELS[validationIndices], testLabels, **metricKWARGS)

    # To be modified to fit to your classifier 
    classifierConfigurationString = "with weights : "+ ", ".join(map(str, list(classifier.weights))) + ", and integer : "+str(classifier.integer)
    # Modify the name of the classifier in these strings
    stringAnalysis = "\t\tResult for Multiview classification with NMC "+ \
                     "\n\n" + metrics[0][0] + " :\n\t-On Train : " + str(scoreOnTrain) + "\n\t-On Test : " + str(
        scoreOnTest) + \
                     "\n\nDataset info :\n\t-Database name : " + name + "\n\t-Labels : " + \
                     ', '.join(LABELS_DICTIONARY.values()) + "\n\t-Views : " + ', '.join(views) + "\n\t-" + str(
        KFolds.n_splits) + \
                     " folds\n\nClassification configuration : \n\t-Algorithm used : NMC " + classifierConfigurationString

    metricsScores = getMetricsScores(metrics, trainLabels, testLabels,
                                     validationIndices, learningIndices, labels)
    stringAnalysis += printMetricScore(metricsScores, metrics)

    imagesAnalysis = {}
    return stringAnalysis, imagesAnalysis, metricsScores
 
 Once you have done this, your classifier is ready to be used by the platform, but you can add some description about your classifier in the analyzeResults file. 
 Adding arguments to avoid hyper parameter optimization
 In order to be able to test a specific set of arguments on this platform, you need to add some lines in the argument parser located in the file Code/MonoMultiViewClassifiers/utils/execution.py in the parseTheArgs function. What you need to do is to add a group of arguments, allowing you to pass the hyper parameters in the command line :
 groupNMC = parser.add_argument_group('New Multiview Classifier arguments')
groupNMC.add_argument('--NMC_weights', metavar='FLOAT', action='store', nargs="+",
                                 help='Determine the weights of NMC', type=float,
                                 default=[])
groupNMC.add_argument('--NMC_integer', metavar='INT', action='store',
                                 help='Determine the integer of NMC', type=int,
                                 default=42)
 
 In order for the platform to use these arguments, you need to modify the getArgs function of the file NMCModule.py. 
 In [1]: 
 def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
    argumentsList = []
    nbViews = len(views)
    arguments = {"CL_type": "NMC",
                 "views": views,
                 "NB_VIEW": len(views),
                 "viewsIndices": viewsIndices,
                 "NB_CLASS": len(args.CL_classes),
                 "LABELS_NAMES": args.CL_classes,
                 "NMCKWARGS": {"weights":args.NMC_weights,  # Modified to take the args into account
                               "integer":args.NMC_integer,  # Modified to take the args into account
                              "nbViews":5}}
    argumentsList.append(arguments)
    return argumentsList