Skip to content
Snippets Groups Projects
Select Git revision
  • b9bb67a5b553b5a072467aa146fc4263bf72c3ec
  • master default protected
  • loss
  • producer
4 results

training.md

Blame
  • Franck Dary's avatar
    Franck Dary authored
    6da709d7
    History

    Training

    The easiest way to train a Reading Machine is to use the scripts provided by macaon_data.
    For example, if one would like to train a parser called myFrenchParser on the UD treebank French-GSD :
    $ cd macaon_data/UD_any
    $ ./prepareExperiment.sh UD_French-GSD parser myFrenchParser
    $ ./train.sh tsv bin/myFrenchParser

    prepareExperiment.​sh

    The purpose of this script is simply to generate a new experiment directory inside bin/.
    The usage is ./prepareExperiment corpusName templateName experimentName.

    train.​sh

    This script will your model by calling macaon train with the correct arguments.
    The usage is ./train.sh mode experimentPath arguments, where :

    • mode is txt if your model does tokenization, tsv if it doesn't.
    • experimentPath is the relative path to your model.
    • arguments is a list of arguments to give to macaon train, it can be empty.

    Example : $ ./train.sh tsv bin/myFrenchParser -n 30 --batchSize 128.

    For a list of available arguments execute macaon train -h.
    You can inspect how a model has been trained by looking at the file bin/yourModel/train.info.
    You can stop training and resume it anytime you want, thus allowing to increase the number of epoch.

    evaluate.​sh

    This script will evaluate your trained model against the test corpora using the official CoNLL 2018 Shared Task eval script.
    Under the hood it is a call to macaon decode, the usage is ./evaluate.sh mode experimentPath arguments, where :

    • mode is txt if your model does tokenization, tsv if it doesn't.
    • experimentPath is the relative path to your model.
    • arguments is a list of arguments to give to macaon decode, it can be empty.

    Example : $ ./evaluate.sh tsv bin/myFrenchParser

    Using your trained model

    Once a model has been trained, you can use it to annotate text.
    If your model doesn't do tokenization, your input file must be formated in the CoNLL-U Plus Format. Otherwise, your input file must be raw utf8 text.
    To use your trained model myFrenchParser to annotate the text in the file myFrenchFile.conllu :

    • $ macaon decode --model bin/myFrenchParser --inputTSV myFrenchFile.conllu

    The annotated file will be printed to the standard output.

    Back to main page