Back to corpus format description
Interaction between morpho-syntactic and MWE annotation
We have used a binary classification of MWEs into syntactically regular and syntactically irregular (as proposed e.g. by (Candito and Constant, 2014)). The MWEs were classified primarily based on their POS pattern (using an automatic pre-classification, plus manual check).
Regular MWEs get no part-of-speech on their own (their distribution is predictible from their internal structure), and get a regular syntactic representation.
Irregular MWEs get a part-of-speech, and are represented using a flat syntactic structure, with all the non-first components attached to the first one (with a "dep_cpd" label for the FTBdep scheme, and "fixed" label for the UD scheme.
Syntactically regular vs irregular MWE
A MWE is considered regular if it is both internally and "externally" regular, namely:
- (i) it has a regular internal structure
- (ii) and, if so, its distribution and/or is possible for such a structure.
A MWE is considered irregular iff
- not (i) : it is not internally regular
- or, (i) but not (ii) : it is internally regular but not externally
MWEs containing cranberry words (i.e. components not appearing in any other context) are systematically considered irregular.
In the absence of cranberry word, the regularity of a structure is primarily determined given the part-of-speech pattern of the MWE: can we have a non-MWE sequence with the same POS pattern?
For instance, the POS pattern of pour l'instant is PREP DET NOUN, for which a regular structure can be found.
Sometimes the POS granularity is too coarse, and we considered subclasses of POS. Examples:
for the pattern PREP NOUN, the absence of determiner was considered regular for some prepositions (e.g. "en" or "par") but not for other ones (e.g. "entre" without any determiner,
for the pattern PREP + ADV, the type of PREP and ADV has to be considered:
- e.g. temporalPREP + temporalADV is regular ("avant maintenant")
- e.g. PREP + locative_ADV is regular for a few prepositions ("de/par/vers ailleurs/ici/là"), hence the MWE "par ailleurs" is considered regular.
- e.g. PREP + tout is regular ("en/avec/pour/après/avant/à tout")
"Externally" regular versus irregular
This distinction is only relevant for internally regular MWEs.
When the distribution of the sequence is possible for such a structure, it is considered externally regular. The distribution includes both what governor and what dependents the structure can take, hereafter the "passive valency" (which potential governors) and the "active valency" (which dependents)
Examples of externally regular MWEs:
For instance, "pour l'instant" has a distribution of prepositional phrase, as expected from its internal structure.
"dans le cadre" has a regular passive valency (it takes as governors the expected governors for PPs), and its active valency is possible for such a pattern (it takes a PP complement with preposition "de", which is possible for such a structure)
Examples of externally irregular MWEs:
le temps (de Vinf / que Psubj) : the structure DET + NOUN is internally regular, but the passive valency is not fully regular (the noun "temps" cannot normally function as modifier of a verb/sentence). Further, while "DET N + que Psubj" has the distribution of an NP when regular, whereas it is not the case for "le temps + que Psubj" (which cannot be a subject for instance).
à travers : the pattern "PREP NOUN" can be considered regular, but the active valency of "à travers" is not regular: it can take a direct NP as complement (à travers les champs (across the fields)), which is not the case for "PREP NOUN"
d'abord exhibits the same regular pattern "PREP NOUN", but its passive valency is not totally regular. For instance "d'abord" can pre-modify a noun (elle a été d'abord pilote, puis instructrice)
POS of MWEs
Because we chose to consider regular MWEs only those being both internally and externally regular, all regular MWEs have a distribution that can be predicted from their internal structure, hence there is no need to specify a POS for the MWE. So only irregular MWEs get a POS. (Note a finer distinction would be to have a regular syntactic representation for any internally regular MWE, and a POS if the MWE is not externally regular).
Interaction between morpho-syntactic annotation and Named entities
Currently, the syntactic representation of named entities follows the regular syntactic scheme. Yet, named entities receive a proper noun POS.