... | @@ -4,15 +4,17 @@ The Multi Column Files (mcf) format is the text format used to represent text an |
... | @@ -4,15 +4,17 @@ The Multi Column Files (mcf) format is the text format used to represent text an |
|
|
|
|
|
The list of labels is the following:
|
|
The list of labels is the following:
|
|
|
|
|
|
* **FORM**
|
|
* **FORM** form of the word
|
|
* **CPOS**
|
|
* **CPOS** coarse part of speech
|
|
* **POS**
|
|
* **POS** part of speech
|
|
* **LEMMA**
|
|
* **LEMMA** lemma
|
|
* **FEATS**
|
|
* **FEATS** other linguistic features (usually morphological)
|
|
* **GOV**
|
|
* **GOV** relative position of the governor (-n indicates that the governor is n words to the left, n indicates that it is n words to the right)
|
|
* **LABEL**
|
|
* **LABEL** label of the syntactic dependency
|
|
* **SENT_SEG**
|
|
* **SENT_SEG** indicates that the word is the last word in the sentence
|
|
* **A** to **Z**
|
|
* **A** to **Z** other labels used to represent other useful information (word duration, speaker, ...)
|
|
|
|
|
|
|
|
Here is an example of two sentences represented as an mcf. The first column corresponds to **FORM**, the second to **POS**, the third to **LEMMA** the fourth to **GOV** the fifth to **LABEL** and the last to *SENT_SEG**
|
|
|
|
|
|
la | det | le | 1 | det | 0
|
|
la | det | le | 1 | det | 0
|
|
---- | ---- | ---- | ---- | ---- | ----
|
|
---- | ---- | ---- | ---- | ---- | ----
|
... | | ... | |