... | ... | @@ -23,35 +23,34 @@ PARSEME-FR annotation guidelines - v1.0 |
|
|
|
|
|
<!--Les expressions polylexicales **verbales** ont été traitées à part, dans le projet PARSEME international, dans le cadre de la production des corpus pour la [_PARSEME shared task 1.1 (2018)_](http://multiword.sourceforge.net/sharedtask2018/). Ainsi, nous avons adopté le guide externe [PARSEME v.1.1](http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1) pour l'identification des expressions verbales. Les membres du projet français PARSEME-FR ont été très impliqués dans le projet international PARSEME et notamment dans la rédaction du guide d'annotation d'expressions verbales. Par conséquent, les deux guides, PARSEME (expressions verbales) et PARSEME-FR (expressions non verbales) sont compatibles et similaires dans l'esprit.-->
|
|
|
|
|
|
#### Nominal expressions : distinguishing named entities from other MWEs
|
|
|
#### Nominal expressions : distinguishing "named entities" from other MWEs
|
|
|
|
|
|
For **nominal multi-word expressions**, we use a primary distinction concerning the naming convention that links the expression and the entity or entities the expression can refer to. The starting intuition is that one can distinguish:
|
|
|
- (1) **entity names** : some nominal MWEs work as the **direct name of a specific entity** (for instance *Anna Duval*)
|
|
|
- (2) versus **instantiable concept names**, working as the name of a concept, which can be used to refer to instances of this concept (e.g. *neural network*).
|
|
|
|
|
|
In this latter case, knowing the defining characteristics of the concept enables one to use it for future instances, without requiring to learn any new naming convention. This contrasts with entity names: in order to use the name *Anna Duval* for a new person, one needs to learn a new naming convention linking the name and this new person, and the characteristics of the person plays almost no role (to be precise, with such an example the name tells us the person should be a woman).
|
|
|
In this latter case, knowing the defining characteristics of the concept enables one to use it for future instances, without requiring to learn any new naming convention. This contrasts with entity names: in order to use the name *Anna Duval* for a new person, one needs to learn a new naming convention linking the name to this new person, and the characteristics of the person plays almost no role (to be precise, with such an example the name tells us the person should be a woman). Note that
|
|
|
- an entity name may well be ambiguous (e.g. several people bearing the same name), the key differentiating trait between (1) and (2) concerns whether or not there must be a naming convention at the level of each entity (Kleiber, 2007)
|
|
|
- for concept names of course there is also a naming convention (why use the noun *table* for a table), but it is defined at the level of the class of entities, not at the level of each entity.
|
|
|
|
|
|
This distinction between entity name and instantiable concept name is reminiscent of the proper noun versus common noun distinction, but this latter distinction is not so easy to define precisely. Of course, lexical items that are exclusively used for directly naming entities (e.g. the first and last names for people) are easy to classify as proper nouns. This is why Erhmann (2008) roughly defines proper nouns as the "désignation d’une entité précise par le biais d’une description dont le sens joue un rôle mineur par rapport à la dénomination, opérant directement, du référent" (the designation of a precise entity via a description whose meaning plays a minor role with respect to the denomination of the referent, which operates directly").
|
|
|
But an abundant litterature shows that the proper / common noun distinction reveals difficult to characterize in linguistic terms (we refer primarily to (Kleiber, 1981) and (Erhmann, 2008) for a state of the art). Indeed within entity names, we can distinguish:
|
|
|
This distinction between entity name and instantiable concept name is reminiscent of the proper noun versus common noun distinction, but this latter distinction is not so easy to define precisely. Of course, lexical items that are exclusively used for directly naming entities (e.g. the first and last names for people) are easily to classified as proper nouns (sometimes called **pure proper nouns**). This is why Erhmann (2008) roughly defines proper nouns as the "désignation d’une entité précise par le biais d’une description dont le sens joue un rôle mineur par rapport à la dénomination, opérant directement, du référent" (the designation of a precise entity via a description whose meaning plays a minor role with respect to the denomination of the referent, which operates directly").
|
|
|
But an abundant litterature shows that the proper / common noun distinction reveals difficult to characterize in linguistic terms (we refer primarily to (Kleiber, 2001;2007) and (Erhmann, 2008) for a state of the art). Indeed within entity names, we can distinguish:
|
|
|
- (1a) entity names composed of lexical items that are dedicated to naming entities (to say it quickly: proper nouns), such as *Italy*, *Anna Duval*, *Microsoft*
|
|
|
- (1b) entity names that have a descriptive basis, such as the "International League against Racism and Anti-Semitism" or the "Massif central" (litterally the "central massif"): the naming convention between the entity and the name is sociologically typical of a proper noun (the name of an association, of a geographical item), but also clearly results from the compatibility of the entity characteristics and the meaning of the lexical items
|
|
|
- (1c) but also names which serve to designate unique abstract entities (sometimes called "unica"), such as abstract simple nouns ("taxidermy") or abstract MWEs ("Euclidean geometry", "machine translation"): although not intuitively classified as proper nouns, they are still the name of a specific entity (or of a concept with one instance only), for which the speakers have to learn the naming convention.
|
|
|
|
|
|
- (1c) but also names which serve to designate unique abstract entities, such as abstract simple nouns ("taxidermy") or abstract MWEs (*Euclidean geometry*, *machine translation*), and names referring to unique concrete entities such as the sun or the moon (often called "unica"): because of the unicity of the entity that can be called that way, they too can be viewed as entity names, for which the speakers have to learn the naming convention.
|
|
|
|
|
|
In NLP, cases (1a) and (1b) fall into the badly named category of **named entities** (we keep this term in the following, although "named entity" should refer to the entity and not the name: we will use it, as usual in the NLP community, for entity names).
|
|
|
Now the thing is that cases like unique abstract terms (*machine translation*) are traditionnally not viewed as proper nouns, and concrete unica like the moon are widely debated. Kleiber (2007) argues that unica terms necessarily name unique entities, whereas this is not the case for entity names (cf. supra ambiguity of entity names). Kleiber (1995) argues that the moon is viewed as a unique entity, whereas Mars is a name that serves to identify a particular planet within the class of planets. While these arguments seem arbitrary to us, we keep the tradition of considering (1b) cases as proper nouns, and (1c) cases as common nouns.
|
|
|
|
|
|
TO BE CONTINUED
|
|
|
En outre, les entités nommées en TAL sont associées à un type sémantique prédéfini.
|
|
|
|
|
|
Dans les annotations PARSEME-FR, nous avons souhaité conserver la distinction claire entre:
|
|
|
- les cas (1a)/(1b) d'un côté, annotés comme entités nommées (EN), via un guide spécifique
|
|
|
- plus précisément nous ne considérons que certains types sémantiques d'entités nommées: PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT, EVENT, qui se trouvent être celles fréquemment identifiées via un nom propre.
|
|
|
- en outre, pour les entités nommées, nous considérons aussi bien le cas polylexical (*Anna Duval*) que le cas mot simple (*Italie*). En effet d'un point de vue applicatif, il aurait été dommage d'ignorer le cas mot simple
|
|
|
- et les cas (2), annotés comme "expressions polylexicales" (EP), à entendre comme expression polylexicale non EN
|
|
|
Within PARSEME-FR, we have chosen to distinguish between:
|
|
|
- cases (1a)/(1b), which are generally considered in NLP as **named entities** (although the term is a bit confusing, cf. "named entity" should refer to the entity and not the name, we will use it, as usual in the NLP community, for entity names), and named entities are generally associated to semantic types (person, organization etc...). We annotate these as named entities (EN), using a dedicated guide, provided they are of the following semantic type: PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT, EVENT (as these happen to often be named with a pure proper noun).
|
|
|
- moreover, for named entities, we do annotate the polylexical case (*Anna Duval*) but also single token entity names (*Italy*, *Anna*): indeed, from the applicative point of view, it would be a pity to ignore the latter.
|
|
|
- for cases (2) and (1c) (which are not intuitively considered proper nouns) we use another guide, and a MWE tag (to be understood as non NE multi-word expression).
|
|
|
|
|
|
Pour les cas de type (1c), les tests linguistiques sont plus proches du cas (2), nous les annotons comme EP.
|
|
|
It remains that these objects share some characteristics, and some tests are similar.
|
|
|
|
|
|
|
|
|
Note that from this difference, it follows that dictionaries usually differentiate both cases, coding entity names would be endlessBecause coding entity names in a dictionary is This is why dictionaries encode names of concepts, and only names of entities that are famous in some way: coding entity names isIts follows from this difference that coding names of entities in a dictionary makes less sense than coding concept names, beingas the producti in a dictionary makes more senseDe cette différence découle que l’on peut avoir un intérêt à coder un nom de concept dans un lexique, mais moins un nom d’entité spécifique.
|
|
|
<!--De cette différence découle que l’on peut avoir un intérêt à coder un nom de concept dans un lexique, mais moins un nom d’entité spécifique. Voir Kleiber 2007 "utilité linguistique plus restreinte ou « privée » pour les noms propres" -->
|
|
|
|
|
|
|
|
|
|
... | ... | |