-
Franck Dary authoredFranck Dary authored
Training
The easiest way to train a Reading Machine is to use the scripts provided by macaon_data.
For example, if one would like to train a parser called myFrenchParser on the UD treebank French-GSD :
$ cd macaon_data/UD_any
$ ./prepareExperiment.sh UD_French-GSD parser myFrenchParser
$ ./train.sh tsv bin/myFrenchParser
prepareExperiment.sh
The purpose of this script is simply to generate a new experiment directory inside bin/.
The usage is ./prepareExperiment corpusName templateName experimentName
.
train.sh
This script will your model by calling macaon train
with the correct arguments.
The usage is ./train.sh mode experimentPath arguments
, where :
- mode is txt if your model does tokenization, tsv if it doesn't.
- experimentPath is the relative path to your model.
- arguments is a list of arguments to give to
macaon train
, it can be empty.
Example : $ ./train.sh tsv bin/myFrenchParser -n 30 --batchSize 128
.
For a list of available arguments execute macaon train -h
.
You can inspect how a model has been trained by looking at the file bin/yourModel/train.info
.
You can stop training and resume it anytime you want, thus allowing to increase the number of epoch.
evaluate.sh
This script will evaluate your trained model against the test corpora using the official CoNLL 2018 Shared Task eval script.
Under the hood it is a call to macaon decode
, the usage is ./evaluate.sh mode experimentPath arguments
, where :
- mode is txt if your model does tokenization, tsv if it doesn't.
- experimentPath is the relative path to your model.
- arguments is a list of arguments to give to
macaon decode
, it can be empty.
Example : $ ./evaluate.sh tsv bin/myFrenchParser
Using your trained model
Once a model has been trained, you can use it to annotate text.
If your model doesn't do tokenization, your input file must be formated in the CoNLL-U Plus Format. Otherwise, your input file must be raw utf8 text.
To use your trained model myFrenchParser
to annotate the text in the file myFrenchFile.conllu
:
$ macaon decode --model bin/myFrenchParser --inputTSV myFrenchFile.conllu
The annotated file will be printed to the standard output.