diff --git a/docs/source/tutorials/example0.rst b/docs/source/tutorials/example0.rst new file mode 100644 index 0000000000000000000000000000000000000000..29f81276096193448f4d1649074300fb136af2b4 --- /dev/null +++ b/docs/source/tutorials/example0.rst @@ -0,0 +1,19 @@ +============================ +Example 0 : SuMMIT on digits +============================ + +In the example, we will run SuMMIT on a famous dataset : `digits <https://scikit-learn.org/stable/auto_examples/datasets/plot_digits_last_image.html>`_ + +TODO. + +Running SuMMIT +-------------- + +To run SuMMIT on digits, one has to run + +.. code-block:: python + + >>> from multiview_platform.execute import execute + >>> execute("example 0") + +This will start a benchmark with this config file located in `` \ No newline at end of file diff --git a/docs/source/tutorials/example2.rst b/docs/source/tutorials/example2.rst index 0c80376d9d58dd3e0fcc39be8b6d506edb920913..638471f9978ed5dd0b085eaf245fbfa319661ad6 100644 --- a/docs/source/tutorials/example2.rst +++ b/docs/source/tutorials/example2.rst @@ -227,7 +227,7 @@ The first difference between these two examples is the time to run the benchmark **Conclusion** The split ratio has two consequences : -- Increasing the test set size decreases the information available in the triain set size so either it helps to vaoid overfitting or it can hide useful information to the classifier and therefor decrease its performance +- Increasing the test set size decreases the information available in the triain set size so either it helps to avoid overfitting or it can hide useful information to the classifier and therefor decrease its performance - The second consequence is that decreasinf test size will increase the benchmark duration as the classifier will have to learn on more examples, this duration modification is higher if the dataste has high dimensionality. Example 2.2 : Usage of randomized hyper-parameter optimization : @@ -328,18 +328,12 @@ with different fold/draws settings : .. raw:: html :file: ./images/durations.html -The duration is in seconds, and we used 2,5,10,15,20 as values for ``nb_folds`` -and 2,5,10,20,30,50,100 for ``n_iter`` with two monoview classifiers and one -multiview classifier on simulated data. +.. note:: + + The durations are for reference only as they depend on the hardware. -The hyper-parameter optimization process generates a report for each -classifier, providing each set of parameters and its cross-validation score, -to be able to extract the relevant parameters for a future benchmark on the -same dataset. -For most of the algorithms, it is possible to paste the report in the config fie, -for example for the decision tree the ``hps_report`` file Example 2.3 : Usage of grid search : @@ -349,9 +343,8 @@ In SuMMIT, it is possible to use a grid search if one has several possible hyper-parameter values in mind to test. In order to set up the grid search one has to provide in the ``hps_args:`` -argument the names, parameters and values to test. Let us say we want to try -several depths for a decision tree, and several ``C`` values for a -linear `SVM <ttps://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>`_: +argument the names, parameters and values to test. If one wants to try +several depths for a decision tree, and several ``n_estimators`` values for adaboost, .. code-block:: yaml @@ -359,7 +352,26 @@ linear `SVM <ttps://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.ht hps_args: decision_tree: max_depth: [1,2,3,4,5] - svm_linear: - C: [0.1,0.2,0.3,0.4,0.5] + adaboost: + n_estimators: [10,15,20,25] + +Moreover, for the multiview algorithms, we would like to try two configurations for the late fusion classifier : + +.. code-block:: yaml + + weighted_linear_late_fusion: + TODO : a more complex example + + +Hyper-parameter report +<<<<<<<<<<<<<<<<<<<<<< + +The hyper-parameter optimization process generates a report for each +classifier, providing each set of parameters and its cross-validation score, +to be able to extract the relevant parameters for a future benchmark on the +same dataset. + +For most of the algorithms, it is possible to paste the report in the config fie, +for example for the decision tree the ``hps_report`` file \ No newline at end of file diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst index 2011f5488eb99578dda23501928f3ffa32a7ae3f..f5f68f11f881bf7948b28403a6a0f1a2af092e35 100644 --- a/docs/source/tutorials/index.rst +++ b/docs/source/tutorials/index.rst @@ -8,6 +8,7 @@ The following are some tutorials which explain how to use the toolbox. :maxdepth: 1 installation + example0 example1 example2 example3