Updated Classifier documentation

c73a43d3 · Franck Dary · dd57c085 · c73a43d3
Commit c73a43d3 authored May 13, 2020 by Franck Dary
--- a/documentation/classifier.md
+++ b/documentation/classifier.md
-TODO
+# Classifier
+The classifier is a neural network, at each step it takes the current [configuration](readingMachine.md) as input and predicts the next [transition](transitionSet.md) to take.
+The classifier must be defined in the [Reading Machine](readingMachine.md) file.\
+It's definition is made of three parts :
+* In the first part we have to define :
+	* A name.
+	* For each state, what is the [Transition Set](transitionSet.md) file associated to it.
+	* For each state, we can specify a scalar for the training loss function to be multiplied by. Default value is 1.0.
+	* The network type : Random (no neural network) or Modular (see below).
+	Example :
+	```
+	Classifier : tagparser
+	{
+	  Transitions : {tagger,data/tagger.ts morpho,data/morpho_parts.ts parser,data/parser.ts segmenter,data/segmenter.ts}
+	  LossMultiplier : {segmenter,10.0}
+	  Network type : Modular
+	```
+* In the second part we must define the feature function and architecture of the neural network, see below for complete overview of Modular network type. This part must be ended with the line 'End'. Example :
+	```
+  StateName : Out{64}
+  Context : Buffer{-3 -2 -1 0 1 2} Stack{} Columns{FORM} LSTM{1 1 0 1} In{64} Out{64}
+  Focused : Column{FEATS} NbElem{13} Buffer{-1 0} Stack{2 1 0} LSTM{1 1 0 1} In{64} Out{64}
+  InputDropout : 0.5
+  MLP : {2048 0.3 2048 0.3}
+  End
+	```
+* In the third part, we must define the hyperparameters of the optimizer algorithm. Currently available optimizers are : 
+	* Adam {learningRate beta1 beta2 epsilon weightDecay useAMSGRAD}
+	Example :
+	```
+	  Optimizer : Adam {0.0002 0.9 0.999 0.00000001 0.00001 true}
+	}
+	```
+## Network type : Random
+The predictions will be chosen at random. There is no parameters to be learned.\
+The purpose of this network type is to debug a [Reading Machine](readingMachine.md) topology, because it is very fast.\
+There is nothing to define, you can put the 'End' line just after the line 'Network type : Random'.
+## Network type : Modular
+Each line of the definition of the Modular network type correspond to a module.\
+The order of the modules in the definition are not important, you can also use a module multiple times.\
+There are two mandatory modules : 
+* `MLP : {Layer1Size Layer1Dropout...}`\
+	Definition of the Multi Layer Perceptron, that will take as input the concatenation of the outputs of all other modules, and will act as the output layers of the neural network.\
+	The last layer (output layer) is not part of the definition because its size is dynamically deduced from the number of outgoing transitions of the current state. Thus you only need to define the hidden layers of the MLP.\
+	Example to define a MLP with 2 hidden layers of respective sizes 2048 and 1024 and of respective dropouts 0.3 and 0.1 :
+	```
+	MLP : {2048 0.3 1024 0.1}
+	```
+* `InputDropout : scalar`\
+	Dropout (between 0.0 and 1.0) to apply to the input of the MLP.
+	Example `InputDropout : 0.5`
+And then there is a list of optional modules you can choose from :
+* `StateName : Out{embeddingSize}`\
+	An embedding of size *embeddingSize* representing the name of the current state.
+* `Context : Buffer{$1} Stack{$2} Columns{$3} $4{$5 $6 $7 $8} In{$9} Out{$10}`\
+	An embedding capturing a relative context around the machine's current word index.
+	* $1 : List of relative buffer indexes to capture. Ex `{-3 -2 -1 0 1 2}`.
+	* $2 : List of stack indexes to capture. Ex `{2 1 0}`.
+	* $3 : List of column names to capture. Ex `{FORM UPOS}`.
+	* $4 : Type of recurrent module to use to generate the context embedding, LSTM or GRU.
+	* $5 : Use bidirectional RNN ? 1 or 0.
+	* $6 : Number of RNN layers to use (minimum 1).
+	* $7 : Dropout to use after RNN hidden layers. Must be 0 if number of layers is 1.
+	* $8 : 1 to concatenate all of the RNN hidden states, 0 to only use the last RNN hidden state.
+	* $9 : Size of the embeddings used to feed the RNN.
+	* $10 : Size of the hidden states of the RNN.
+* `Focused Column{$1} NbElem{$2} Buffer{$3} Stack{$4} $5{$6 $7 $8 $9} In{$10} Out{$11}`\
+	An embedding capturing a specific string, viewed as a sequence of elements.\
+	If Column = FORM elements are the letters, if Column = FEATS elements are the traits.
+	* $1 : Column name to capture. Ex `{FORM}`.
+	* $2 : Maximum number of elements (example max number of letters in a word).
+	* $3 : List of relative buffer indexes to capture. Ex `{-3 -2 -1 0 1 2}`.
+	* $4 : List of stack indexes to capture. Ex `{2 1 0}`.
+	* $5 : Type of recurrent module to use to generate the context embedding, LSTM or GRU.
+	* $6 : Use bidirectional RNN ? 1 or 0.
+	* $7 : Number of RNN layers to use (minimum 1).
+	* $8 : Dropout to use after RNN hidden layers. Must be 0 if number of layers is 1.
+	* $9 : 1 to concatenate all of the RNN hidden states, 0 to only use the last RNN hidden state.
+	* $10 : Size of the embeddings used to feed the RNN.
+	* $11 : Size of the hidden states of the RNN.