|
|
# Interaction between morpho-syntactic and MWE annotation
|
|
|
|
|
|
We have used a binary classification of MWEs into syntactically regular and syntactically irregular (as proposed e.g. by (Candito and Constant, 2014)).
|
|
|
The MWEs were classified primarily based on their POS pattern (using an automatic pre-classification, plus manual check).
|
|
|
|
|
|
Regular MWEs get no part-of-speech on their own (their distribution is predictible from their internal structure), and get a regular syntactic representation.
|
|
|
|
|
|
Irregular MWEs get a part-of-speech, and are represented using a flat syntactic structure, with all the non-first components attached to the first one (with a "dep_cpd" label for the FTBdep scheme, and "fixed" label for the UD scheme.
|
|
|
|
|
|
|
|
|
## Syntactically regular vs irregular MWE
|
|
|
|
|
|
A MWE is considered **regular** if it is both internally and "externally" regular, namely:
|
|
|
- (i) it has a regular internal structure
|
|
|
- (ii) and, if so, its distribution and/or is possible for such a structure.
|
|
|
|
|
|
A MWE is considered **irregular** iff
|
|
|
- not (i) : it is not internally regular
|
|
|
- or, (i) but not (ii) : it is internally regular but not externally
|
|
|
|
|
|
### Internal regularity
|
|
|
|
|
|
MWEs containing cranberry words (i.e. components not appearing in any other context) are systematically considered irregular.
|
|
|
|
|
|
In the absence of cranberry word, the regularity of a structure is primarily determined given the part-of-speech pattern of the MWE: can we have a non-MWE sequence with the same POS pattern?
|
|
|
|
|
|
For instance, the POS pattern of _pour l'instant_ is *PREP DET NOUN*, for which a regular structure can be found.
|
|
|
|
|
|
Sometimes the POS granularity is too coarse, and we considered subclasses of POS. Examples:
|
|
|
- for the pattern *PREP NOUN*, the absence of determiner was considered regular for some prepositions (e.g. "en" or "par") but not for other ones (e.g. "entre" without any determiner,
|
|
|
|
|
|
- for the pattern *PREP + ADV*, the type of PREP and ADV has to be considered:
|
|
|
- e.g. temporalPREP + temporalADV is regular ("avant maintenant")
|
|
|
- e.g. _PREP + locative\_ADV_ is regular for a few prepositions ("_de/par/vers ailleurs/ici/là_"), hence the MWE "_par ailleurs_" is considered regular.
|
|
|
- e.g. _PREP + tout_ is regular ("_en/avec/pour/après/avant/à tout_")
|
|
|
|
|
|
### "Externally" regular versus irregular
|
|
|
|
|
|
This distinction is only relevant for internally regular MWEs.
|
|
|
|
|
|
When the distribution of the sequence is possible for such a structure, it is considered externally regular. The distribution includes both what governor and what dependents the structure can take, hereafter the "passive valency" (which potential governors) and the "active valency" (which dependents)
|
|
|
|
|
|
Examples of externally regular MWEs:
|
|
|
|
|
|
- For instance, "_pour l'instant_" has a distribution of prepositional phrase, as expected from its internal structure.
|
|
|
|
|
|
- "_dans le cadre_" has a regular passive valency (it takes as governors the expected governors for PPs), and its active valency is possible for such a pattern (it takes a PP complement with preposition "de", which is possible for such a structure)
|
|
|
|
|
|
Examples of externally irregular MWEs:
|
|
|
- _le temps (de Vinf / que Psubj)_ : the structure DET + NOUN is internally regular, but the passive valency is not fully regular (the noun "temps" cannot normally function as modifier of a verb/sentence). Further, while "DET N + que Psubj" has the distribution of an NP when regular, whereas it is not the case for "le temps + que Psubj" (which cannot be a subject for instance).
|
|
|
|
|
|
- _à travers_ : the pattern "PREP NOUN" can be considered regular, but the active valency of "_à travers_" is not regular: it can take a direct NP as complement (_à travers les champs_ (across the fields)), which is not the case for "PREP NOUN"
|
|
|
|
|
|
- _d'abord_ exhibits the same regular pattern "PREP NOUN", but its passive valency is not totally regular. For instance "_d'abord_" can pre-modify a noun (_elle a été d'abord pilote, puis instructrice_)
|
|
|
|
|
|
## POS of MWEs
|
|
|
|
|
|
Because we chose to consider regular MWEs only those being both internally and externally regular, all regular MWEs have a distribution that can be predicted from their internal structure, hence there is no need to specify a POS for the MWE.
|
|
|
So only irregular MWEs get a POS.
|
|
|
(Note a finer distinction would be to have a regular syntactic representation for any internally regular MWE, and a POS if the MWE is not externally regular).
|
|
|
|
|
|
<!--Some prepositions take a PP complement (jusque (chez Paul/sous la table / à Paris / en Auvergne), de (sous la table/ entre les fils)).-->
|
|
|
|
|
|
# Interaction between morpho-syntactic annotation and Named entities
|
|
|
|
|
|
Currently, the syntactic representation of named entities follows the regular syntactic scheme.
|
|
|
Yet, named entities receive a proper noun POS. |