Multi Column Description Format
The Multi Column Description Format (mcd) associates labels to the columns of an mcf file. Each line of an mcd file describes a column of an mcf file. Each line of an mcd file is made of four columns
- The first column contains an integer that indicates the column of the mcf that is being described
- The second column associates a label to the mcf column that is being described
- The third column indicates the type of the value found in the column. Three types are defined:
- VOCAB indicates that the values are labels
- INT indicates that the values are integers
- EMB indicates that the values are real valued vectors (embeddings)
- The fourth column is used for embeddings, it is the name of a file containing the embeddings
Here is an example of an mcd file
1 | FORM | VOCAB | _ ---- | ---- | ---- | ---- | ---- 2 | POS | VOCAB | _ 3 | LEMMA | VOCAB |_ 4 | GOV | INT | _ 5 | LABEL | VOCAB | _ 6 | SENT_SEG | INT | _
An example of an mcf corresponding to this mcd cand be found here