... | ... | @@ -14,7 +14,6 @@ Similarly to _PARSEME:MWE_, the information in the 11th column called _PARSEME-F |
|
|
3. a list of semicolon-separated **codes** if the current word is part of one or more MWEs/NEs. Codes are only assigned to the lexicalized components of a MWE/NE (see [Lexicalized components and open slots](http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1/?page=lexicalized) in the PARSEME annotation guidelines).
|
|
|
- If the current line contains the first lexicalized component of the MWE/NE in the sentence, the code consists of an **identifier** followed by a colon ':' and a **pos-category-criteria label**:
|
|
|
* the **identifier** of a MWE/NE is an integer (starting at 1), and is unique within the sentence.
|
|
|
<!--are integers starting from 1 for each new sentence, and increased by 1 for each new annotation.-->
|
|
|
* **pos-category-criteria labels** are strings corresponding to information about the MWE/NE. These labels are composed of three fields separated by a pipe '|' character (i.e. POS|CATEGORY|CRITERION1,CRITERION2...):
|
|
|
1. **POS** is a tag representing the part of speech of the whole MWE/NE. The tags were inferred automatically using heuristics, or defined manually for irregular constructions. **[MARIE add link to POS details here if relevant](XXX)**.
|
|
|
2. **CATEGORY** is a tag corresponding to a category that depends on the type of entity being annotated. It contains a prefix and a suffix, separated by a dash.
|
... | ... | |