Skip to content
Snippets Groups Projects
Select Git revision
  • 02c3748f40cd1c80a3ee07bcbd0148788e13c3da
  • master default
  • object
  • develop protected
  • private_algos
  • cuisine
  • SMOTE
  • revert-76c4cca5
  • archive protected
  • no_graphviz
  • 0.0.2
  • 0.0.1
12 results

QarBoost.py

Blame
  • Adding a multiview classifier.ipynb 19.55 KiB

    How to add a multiview classifier to the platform

    File addition

    • In the Code/MonoMultiViewClassifiers/MultiviewClassifiers package, add a new package named after your multiview classifier (let's call it NMC for New Multiview Classifier).

    • In this package (Code/MonoMultiViewClassifiers/MultiviewClassifiers/NMC), add a file called NMCModule.py and another one called analyzeResults.py. These will be the two files used by the platform to communicate with your implementation.

    • You can now add either a package named after your classifier NMCPackage and paste your files in it or just add a file with the same name if it is enough.

    NMCModule.py

    Here we will list all the necessary functions of the python module to allow the platform to use NMC

    The functions

    getArgs

    This function is used to multiple arguments dictionaries from one benchmark entry. It must return the argumentsList to which it must have add at least a dictionary containing all the necessary information to run NMC. You must add all general fields about the type of classifier and a field called NMCKWARGS (<classifier_name>KWARGS) conataining another dictionary with classifier-specific agruments (we assume here that NMC has two hyper-parameters : a set of weights and an integer)

    In [2]:
    arguments = {"CL_type":"NMC", 
                "views":["all", "the", "views", "names"],
                "NB_VIEW":len(["all", "the", "views", "names"]), 
                "viewsIndices":["the indices", "of the", "views in", "the hdf5 file"], 
                "NB_CLASS": "the number of labels of the dataset", 
                "LABLELS_NAMES": ["the names of", "the labels used"], 
                "NMCKWARGS":{"weights":[], 
                             "integer":42,
                             "nbViews":5}
                }

    To fill these fields, you can use the default values given in argument of the function :

    def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
        argumentsList = []
        nbViews = len(views)
        arguments = {"CL_type": "NMC",
                     "views": views,
                     "NB_VIEW": len(views),
                     "viewsIndices": viewsIndices,
                     "NB_CLASS": len(args.CL_classes),
                     "LABELS_NAMES": args.CL_classes,
                     "NMCKWARGS": {"weights":[],
                                   "integer":42,
                                  "nbViews":5}}
        argumentsList.append(arguments)
        return argumentsList

    This function is also used to add the user-defined configuration for the classifier, but we will discuss it later

    genName

    This function is used to generate a short string describing the classifier using its configuration.

    In [3]:
    def genName(config):
        return "NMF"

    Some classifiers, like some late fusion classifiers will have more complicated genName functions that will need to summurize which monoview classifiers they use in a short string using the config argument that is exactly the dictionay called "NMCKWARGS" in the getArgs function

    getBenchmark

    This function is used to generate the benchmark argument of getArgs. It stores all the different configurations that will have to be tested (does not inculde hyper-parameters sets). For example for the Mumbo classifier, il will store the list of possible algorithms to use as weak leaners.

    In [4]:
    def getBenchmark(benchmark, args=None):
        benchmark["Multiview"]["NMC"] = ["Some NMC cnfigurations"]
        return benchmark

    The benchmark argument is pre-generated with an entry for all the multiview classifiers so you just need to fill it with the different configurations

    genParamsSets

    This function is used to generate random hyper-parameters sets to allow a randomized search to estimate the best one. It works in pair with the setParams method implemented in the classifier's class so you need to keep in mind the order of the hyper-paramters you used here.

    The classificationKWARGS argument is the "NMCKWARGS" entry seen earlier, and it is highly recommended to use the randomState object (which is described here) to generate random numbers in order for the results to be reproductible

    Assuming our NMC classifier has 2 HP, one weight vector for each view and one integer that can be between 1 and 100, the genParamsSets function will look like :

    In [5]:
    def genParamsSets(classificationKWARGS, randomState, nIter=1):
        weightsVector = [randomState.random_sample(classificationKWARGS["nbViews"]) for _ in range(nIter)]
        nomralizedWeights = [weights/np.sum(weights) for weights in weightsVector]
        intsVector = list(randomState.randint(1,100,nIter))
        paramsSets = [[normalizedWeight, integer] for normalizedWeight, interger in zip(normalizedWeights, intsVector)]
        return paramsSets

    The NMC class

    It has to be named after the classifier adding Class at the end of its name.

    In [6]:
    class NMCClass:
        pass

    init method

    There is nothing specific to define in the __init__ method, you just need to initialize the attributes of your classifier. The kwargs argument is the NMCKWARGS dictionary seen earlier. In our example, NMC uses two hyper parameters : weights and an int.

    In [2]:
    def __init__(self, randomState, NB_CORES=1, **kwargs):
        if kwargs["weights"] == []:
            self.weights = randomState.random_sample(classificationKWARGS["nbViews"])
        else:
            self.weights = kwargs["weights"]
        self.weights /= np.sum(self.weights)
        self.integer = kwargs["integer"]

    setParams method

    This method is used to tune your classifier with a set of hyper parameters. The set is a list ordered as in the genParamsSets function seen earlier. The input of the setParams method is a list of parameters in the right order.

    def setParams(self, paramsSet):
        self.weights = paramsSet[0]
        self.integer = paramsSet[1]

    fit_hdf5 method

    This method is generaly the same as sklearn's fit method but uses as an input an HDF5 dataset in order to lower the memory usage of the whole platform.

    • The DATASET object is an HDF5 dataset file containing all the views and labels.
    • The usedIndices object is a numpy 1d-array containing the indices of the examples want to learn from.
    • The viewsIndices object is a numpy 1d-array containing the indices of the views we want to learn from.
    In [8]:
    def fit_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
        # Call the fit function of your own module
        pass

    predict_hdf5 method

    This method is used as an HDF5-compatible method similar to sklearn's predict method. It has the same input than the fit_hdf5 method but returns a 1d-array containing the labels of the asked examples (ordered as in usedIndices).

    In [11]:
    def predict_hdf5(self, DATASET, usedIndices=None, viewsIndices=None):
        # Call the predict function of your own module
        predictedLabels = None # Just to avoid any ipynb running error
        return predictedLabels

    Once you've added everything to the NMCModule.py file you are close to be able to run your algorithm on the platform, you just need to fill the analyzeResults.py file.

    analyzeResults.py

    The analyzeResults.py file is a module used to get a specific result analysis for your classifier. You have, in order to run the platform, to add aunique function called execute that will run the analysis and return three different variables :

    • stringAnalysis is a string that will be saved in a file to describe the classifier, its performance and may give some insights on the interpretation of it's way to classify.
    • imagesAnalysis is a dictionary where you can store images (as values) to describe the classifier & co., the keys will be the images names.
    • metricsScores is a dictionary where the values are lists containing train and test scores, and the keys are the metrics names. ( metricsScores = {"accuracy_score":[0.99, 0.10]} ) The execute function has as inputs :
    • classifier is a classifier object from your classifiers class
    • trainLabels are the labels predicted for the train set by the classifier
    • testLabels are the labels predicted for the test set by the classifier
    • DATASET is the HDF5 dataset object
    • classificationKWARGS is the dictionary named NMCKWARGS earlier
    • classificationIndices is a triplet containing the learning indices, the validation indices and the testIndices for multiclass classification
    • LABELS_DICTIONARY is a dictionary containing a label as a key and it's name as a value
    • views is the list of the views names used by the classifier
    • nbCores is an int fixing the number of threads used by the platform
    • times is a tuple containing the extraction time and the classification time
    • name is the name ofthe database on which the plartform is running
    • KFolds is an sklearn kfold object used for the cross-validation
    • hyperParamSearch is the type of the hyper parameters optimization method
    • nIter is the number of iterations of the hyper parameters method
    • metrics is the list of the metrics and their arguments
    • viewsIndices is 1d-array of the indices of the views used for classification
    • randomState is a numpy RandomState object
    • labels are the groud truth labels of the dataset

    The basic function analyzing results for all the classifiers looks like :

    from ... import Metrics
    from ...utils.MultiviewResultAnalysis import printMetricScore, getMetricsScores
    
    def execute(classifier, trainLabels,
                testLabels, DATASET,
                classificationKWARGS, classificationIndices,
                LABELS_DICTIONARY, views, nbCores, times,
                name, KFolds,
                hyperParamSearch, nIter, metrics,
                viewsIndices, randomState, labels):
        CLASS_LABELS = labels
        learningIndices, validationIndices, testIndicesMulticlass = classificationIndices
    
        metricModule = getattr(Metrics, metrics[0][0])
        if metrics[0][1] is not None:
            metricKWARGS = dict((index, metricConfig) for index, metricConfig in enumerate(metrics[0][1]))
        else:
            metricKWARGS = {}
        scoreOnTrain = metricModule.score(CLASS_LABELS[learningIndices], CLASS_LABELS[learningIndices], **metricKWARGS)
        scoreOnTest = metricModule.score(CLASS_LABELS[validationIndices], testLabels, **metricKWARGS)
    
        # To be modified to fit to your classifier 
        classifierConfigurationString = "with weights : "+ ", ".join(map(str, list(classifier.weights))) + ", and integer : "+str(classifier.integer)
        # Modify the name of the classifier in these strings
        stringAnalysis = "\t\tResult for Multiview classification with NMC "+ \
                         "\n\n" + metrics[0][0] + " :\n\t-On Train : " + str(scoreOnTrain) + "\n\t-On Test : " + str(
            scoreOnTest) + \
                         "\n\nDataset info :\n\t-Database name : " + name + "\n\t-Labels : " + \
                         ', '.join(LABELS_DICTIONARY.values()) + "\n\t-Views : " + ', '.join(views) + "\n\t-" + str(
            KFolds.n_splits) + \
                         " folds\n\nClassification configuration : \n\t-Algorithm used : NMC " + classifierConfigurationString
    
        metricsScores = getMetricsScores(metrics, trainLabels, testLabels,
                                         validationIndices, learningIndices, labels)
        stringAnalysis += printMetricScore(metricsScores, metrics)
    
        imagesAnalysis = {}
        return stringAnalysis, imagesAnalysis, metricsScores

    Once you have done this, your classifier is ready to be used by the platform, but you can add some description about your classifier in the analyzeResults file.

    Adding arguments to avoid hyper parameter optimization

    In order to be able to test a specific set of arguments on this platform, you need to add some lines in the argument parser located in the file Code/MonoMultiViewClassifiers/utils/execution.py in the parseTheArgs function. What you need to do is to add a group of arguments, allowing you to pass the hyper parameters in the command line :

    groupNMC = parser.add_argument_group('New Multiview Classifier arguments')
    groupNMC.add_argument('--NMC_weights', metavar='FLOAT', action='store', nargs="+",
                                     help='Determine the weights of NMC', type=float,
                                     default=[])
    groupNMC.add_argument('--NMC_integer', metavar='INT', action='store',
                                     help='Determine the integer of NMC', type=int,
                                     default=42)

    In order for the platform to use these arguments, you need to modify the getArgs function of the file NMCModule.py.

    In [1]:
    def getArgs(args, benchmark, views, viewsIndices, randomState, directory, resultsMonoview, classificationIndices):
        argumentsList = []
        nbViews = len(views)
        arguments = {"CL_type": "NMC",
                     "views": views,
                     "NB_VIEW": len(views),
                     "viewsIndices": viewsIndices,
                     "NB_CLASS": len(args.CL_classes),
                     "LABELS_NAMES": args.CL_classes,
                     "NMCKWARGS": {"weights":args.NMC_weights,  # Modified to take the args into account
                                   "integer":args.NMC_integer,  # Modified to take the args into account
                                  "nbViews":5}}
        argumentsList.append(arguments)
        return argumentsList