Multi Column Format
The Multi Column Format (mcf) is the text format of the files used to represent textual and its annotations. Every line of an mcf corresponds to an atomic unit of text (abusively called word). Each column describes an attribute of the atomic token. Columns are separated by tab characters. The number of columns in an mcf is unbounded. Columns can be associated to a label via an mcd file. The association of a column to a label allows to access the content of each column through Word Features
Here is an example of two sentences represented as an mcf. The first column corresponds to FORM, the second to POS, the third to LEMMA the fourth to GOV the fifth to LABEL and the last to SENT_SEG. The mcd corresponding to this file can be found here
la | det | le | 1 | det | 0 |
---|---|---|---|---|---|
diane | nc | diane | 1 | suj | 0 |
chantait | v | chanter | 0 | root | 0 |
dans | prep | dans | -1 | mod | 0 |
la | det | le | 1 | det | 0 |
cour | nc | cour | -2 | obj | 0 |
des | prep | des | -1 | dep | 0 |
casernes | nc | caserne | -1 | obj | 0 |
. | poncts | . | -6 | eos | 1 |
et | coo | et | 0 | root | 0 |
le | det | le | 1 | det | 0 |
vent | nc | vent | 3 | suj | 0 |
du | prep | du | -1 | dep | 0 |
matin | nc | matin | -1 | obj | 0 |
soufflait | v | souffler | -5 | dep_coord | 0 |
sur | prep | sur | -1 | mod | 0 |
les | det | le | 1 | det | 0 |
lanternes | nc | lanterne | -2 | obj | 0 |
. | poncts | . | -9 | eos | 1 |