# Reading Machine A reading machine can be thought of as a kind of deterministic finite automaton where : * The machine is made of *states*, *transitions* and a *strategy*. * The machine works on a *configuration*, representing the input text to be annotated and its annotations so far. * A *transition* is an endofunction in the domain of *configurations*, making a small change like adding a single annotation. * The *strategy* is a function that takes the current *state* and the chosen *transition* as inputs, and returns the new *state*. * The machine contains a classifier that is trained to predict the next *transition* to take, given the current *configuration*. * At each step, and until the *configuration* is final : 1. The classifier will predict the next *transition* to take. 2. This *transition* will be applied to the current *configuration*, thus yielding the new *configuration*. 3. The *strategy* will determine the new *state*. ## Configuration : A configuration is the current state of the analysis, it is made of the input text along with all of the annotations predicted so far.\ It is said to be final when all the input text has been processed. ## File format : A reading machine is defined in a `.rm` file (or given as argument to `macaon train`).\ Here is an example of a Reading Machine doing tokenization, POS tagging, Morphological tagging, dependency parsing and sentence segmentation in a sequential fashion : ``` Name : Tokenizer, Tagger, Morpho and Parser Machine Classifier : tokeparser { Transitions : {tokenizer,data/tokenizer.ts tagger,data/tagger.ts morpho,data/morpho_parts.ts parser,data/parser_eager_rel_strict.ts segmenter,data/segmenter.ts} LossMultiplier : {segmenter,3.0} Network type : Modular StateName : Out{1024} Context : Buffer{-3 -2 -1 1 2} Stack{} Columns{FORM} LSTM{1 1 0 1} In{64} Out{64} Context : Buffer{-3 -2 -1 0 1 2} Stack{1 0} Columns{UPOS} LSTM{1 1 0 1} In{64} Out{64} Focused : Column{ID} NbElem{1} Buffer{-1 0 1} Stack{2 1 0} LSTM{1 1 0 1} In{64} Out{64} Focused : Column{FORM} NbElem{13} Buffer{-1 0 1 2} Stack{2 1 0} LSTM{1 1 0 1} In{64} Out{64} Focused : Column{FEATS} NbElem{13} Buffer{-1 0 1 2} Stack{2 1 0} LSTM{1 1 0 1} In{64} Out{64} Focused : Column{EOS} NbElem{1} Buffer{-1 0} Stack{} LSTM{1 1 0 1} In{64} Out{64} Focused : Column{DEPREL} NbElem{1} Buffer{} Stack{2 1 0} LSTM{1 1 0 1} In{64} Out{64} DepthLayerTree : Columns{DEPREL} Buffer{} Stack{2 1 0} LayerSizes{3} LSTM{1 1 0.0 1} In{64} Out{64} History : NbElem{10} LSTM{1 1 0 1} In{64} Out{64} RawInput : Left{5} Right{5} LSTM{1 1 0.0 1} In{32} Out{32} SplitTrans : LSTM{1 1 0.0 1} In{64} Out{64} InputDropout : 0.5 MLP : {2048 0.3 2048 0.3} End Optimizer : Adam {0.0003 0.9 0.999 0.00000001 0.00002 true} } Splitwords : data/splitwords.ts Predictions : ID FORM UPOS FEATS HEAD DEPREL EOS Strategy { Block : End{cannotMove} tokenizer tokenizer ENDWORD 1 tokenizer tokenizer SPLIT 1 tokenizer tokenizer * 0 Block : End{cannotMove} tagger tagger * 1 Block : End{cannotMove} morpho morpho NOTHING 1 morpho morpho * 0 Block : End{cannotMove} parser segmenter eager_SHIFT 0 parser segmenter eager_RIGHT_rel 0 parser parser * 0 segmenter parser * 1 } ``` This format is composed of several parts : * Name : The name of your machine. * Classifier : The name of your classifier, followed by its definition between braces. See [Classifier](classifier.md). * Splitwords : [Transition Set](transitionSet.md) file that contains transitions for multi-words tokenization.\ It is only mandatory if the machine performs tokeization. This file is automatically generated by `train.sh`. * Predictions : Names of the columns that are predicted by your machine. * Strategy, followed by its definition between braces. See [Strategy](strategy.md). [Back to main page](../README.md)