title='Loss values of {}\nusing best params of previous stages'.format(args.dataset_name))
else:
raiseValueError('This stage number is not supported yet, but it will be!')
logger.info('Done.')
"""
TODO:
For each dataset:
Stage 1) A figure for the selection of the best base forest model hyperparameters (best vs default/random hyperparams)
Stage 2) A figure for the selection of the best dataset normalization method
Stage 3) A figure for the selection of the best combination of dataset: normalization vs D normalization vs weights normalization
Stage 4) A figure for the selection of the most relevant subsets combination: train,dev vs train+dev,train+dev vs train,train+dev
Stage 5) A figure for the selection of the best extracted forest size?
Stage 6) A figure to finally compare the perf of our approach using the previous selected parameters vs the baseline vs other papers
Stage 3)
In all axis:
- untrained forest
- trained base forest (straight line cause it doesn't depend on the number of extracted trees)
Axis 1:
- test with forest on train+dev and OMP on train+dev
- test with forest on train+dev and OMP on train+dev with dataset normalization
- test with forest on train+dev and OMP on train+dev with dataset normalization + D normalization
- test with forest on train+dev and OMP on train+dev with dataset normalization + weights normalization
- test with forest on train+dev and OMP on train+dev with dataset normalization + D normalization + weights normalization
Axis 2:
- test with forest on train and OMP on dev
- test with forest on train and OMP on dev with dataset normalization
- test with forest on train and OMP on dev with dataset normalization + D normalization
- test with forest on train and OMP on dev with dataset normalization + weights normalization
- test with forest on train and OMP on dev with dataset normalization + D normalization + weights normalization
Axis 3:
- test with forest on train and OMP train+dev
- test with forest on train and OMP train+dev with dataset normalization
- test with forest on train and OMP train+dev with dataset normalization + D normalization
- test with forest on train and OMP train+dev with dataset normalization + weights normalization
- test with forest on train and OMP train+dev with dataset normalization + D normalization + weights normalization
IMPORTANT: Same seeds used in all axis.
"""
Stage 1) [DONE for california_housing] A figure for the selection of the best base forest model hyperparameters (best vs default/random hyperparams)
Stage 2) [DONE for california_housing] A figure for the selection of the best combination of normalization: D normalization vs weights normalization (4 combinations)
Stage 3) [DONE for california_housing] A figure for the selection of the most relevant subsets combination: train,dev vs train+dev,train+dev vs train,train+dev
Stage 4) A figure to finally compare the perf of our approach using the previous selected
parameters vs the baseline vs other papers using different extracted forest size
(percentage of the tree size found previously in best hyperparams search) on the abscissa.