Skip to content
Snippets Groups Projects

Training

The easiest way to train a Reading Machine is to use the scripts provided by macaon_data.
For example, if one would like to train a parser called myFrenchParser on the UD treebank French-GSD :
$ cd macaon_data/UD_any
$ ./prepareExperiment.sh UD_French-GSD parser myFrenchParser
$ ./train.sh tsv bin/myFrenchParser

prepareExperiment.​sh

The purpose of this script is simply to generate a new experiment directory inside bin/.
The usage is ./prepareExperiment corpusName templateName experimentName.

train.​sh

This script will your model by calling macaon train with the correct arguments.
The usage is ./train.sh mode experimentPath arguments, where :

  • mode is txt if your model does tokenization, tsv if it doesn't.
  • experimentPath is the relative path to your model.
  • arguments is a list of arguments to give to macaon train, it can be empty.

Example : $ ./train.sh tsv bin/myFrenchParser -n 30 --batchSize 128.

For a list of available arguments execute macaon train -h.
You can inspect how a model has been trained by looking at the file bin/yourModel/train.info.
You can stop training and resume it anytime you want, thus allowing to increase the number of epoch.

evaluate.​sh

This script will evaluate your trained model against the test corpora using the official CoNLL 2018 Shared Task eval script.
Under the hood it is a call to macaon decode, the usage is ./evaluate.sh mode experimentPath arguments, where :

  • mode is txt if your model does tokenization, tsv if it doesn't.
  • experimentPath is the relative path to your model.
  • arguments is a list of arguments to give to macaon decode, it can be empty.

Example : $ ./evaluate.sh tsv bin/myFrenchParser

Using your trained model

Once a model has been trained, you can use it to annotate text.
If your model doesn't do tokenization, your input file must be formated in the CoNLL-U Plus Format. Otherwise, your input file must be raw utf8 text.
To use your trained model myFrenchParser to annotate the text in the file myFrenchFile.conllu :

  • $ macaon decode --model bin/myFrenchParser --inputTSV myFrenchFile.conllu

The annotated file will be printed to the standard output.

Back to main page