... | ... | @@ -10,7 +10,7 @@ PARSEME-FR annotation guidelines - v1.0 |
|
|
- [Background: verbal MWEs of PARSEME and distinction between named entities and MWEs](#background:-verbal-mwes-of-parseme-and-distinction-between-named-entities-and-mwes)
|
|
|
- [Top decision tree](#top-decision-tree), which serves to direct the annotator to either:
|
|
|
- [the PARSEME v.1.1 guide for **verbal MWEs** (external link)](http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1)
|
|
|
- [the PARSEME-FR guide for **named entities** (NE)](ep_et_en)
|
|
|
- [the PARSEME-FR guide for **named entities** (NE)](ne-decision-tree)
|
|
|
- [the PARSEME-FR guide for other MWEs (not NE, and non verbal)](Criteres)
|
|
|
|
|
|
|
... | ... | @@ -40,21 +40,20 @@ But an abundant litterature shows that the proper / common noun distinction reve |
|
|
- **(1b)** entity names that have a descriptive basis, such as the "International League against Racism and Anti-Semitism" or the "Massif central" (litterally the "central massif"): the naming convention between the entity and the name is sociologically typical of a proper noun (the name of an association, of a geographical item), but also clearly results from the compatibility of the entity characteristics and the meaning of the lexical items
|
|
|
- **(1c)** but also names which serve to designate unique abstract entities, such as abstract simple nouns (*taxidermy*) or abstract MWEs (*Euclidean geometry*, *machine translation*): because of the unicity of the entity that can be called that way, they too can be viewed as entity names, for which the speakers have to learn the naming convention.
|
|
|
|
|
|
Now the thing is that cases (1d) are traditionnally not viewed as proper nouns. Kleiber (1996) argues that proper nouns function to name a particular entity within a specified class (a particular person within the class of persons). We keep this tradition of considering (1b) cases as proper nouns, and (1c) cases as common nouns.
|
|
|
Now the thing is that cases (1d) are traditionnally not viewed as proper nouns. Kleiber (1996) argues that proper nouns function to name a particular entity within a specified class (a particular person within the class of persons).
|
|
|
|
|
|
Note there are also names referring to **unique concrete entities** such as the sun or the moon (often called "unica"), whose status is widely debated. We tag these as named entities, unless used to refer to an instance of a class: The moon will itconsider:
|
|
|
- when the Moon or the moon refers to the natural satellite of the Earth, we consider it as a proper noun and a named entity
|
|
|
- when it is used to refer to an instance of natural satellite (as in *several planets have moons*)
|
|
|
We find an intuitive distinction between cases (1b) and (1c), and concrete unica like the *moon* (or the *Moon*) are widely debated. Kleiber (2007) argues that unica terms necessarily name unique entities, whereas this is not the case for entity names (cf. supra ambiguity of entity names). Kleiber (1995) argues that the moon is viewed as a unique entity, whereas Mars is a name that serves to identify a particular planet within the class of planets. While these arguments seem arbitrary to us,
|
|
|
|
|
|
, and names referring to unique concrete entities such as the sun or the moon (often called "unica"):
|
|
|
|
|
|
Within PARSEME-FR, we have chosen to distinguish between:
|
|
|
Within PARSEME-FR, we have chosen to keep this tradition of considering (1b) cases as proper nouns, and (1c) cases as common nouns. We distinguish between:
|
|
|
- cases (1a)/(1b), which are generally considered in NLP as **named entities** and associated to a semantic type (person, organization etc...). Although the term is confusing (cf. the linguistic expression is an entity name, not a named entity) we will use it in the following, as usual in the NLP community, for entity names). We annotate these as named entities (EN), using a dedicated guide, provided they are of the following semantic type: PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT, EVENT (as these happen to often be named with a pure proper noun).
|
|
|
- moreover, for named entities, we do annotate both the multiword case (*Anna Duval*) and the single token cases (*Italy*, *Anna*): indeed, from the applicative point of view, it would be a pity to ignore the latter.
|
|
|
- for cases (2) and (1c) (which are not intuitively considered proper nouns) we use another guide, and a MWE tag (to be understood as non NE multi-word expression).
|
|
|
|
|
|
It remains that these objects share some characteristics, and some tests are similar.
|
|
|
|
|
|
Note there are also names referring to **unique concrete entities** such as the sun or the moon (often called "unica"), whose status is widely debated. We tag these as named entities (e.g. in *I can see you thanks to the moon*), unless used to refer to an instance of a class (as in *several planets have moons*).
|
|
|
|
|
|
Our annotation process uses a top decision tree that directs the annotator towards the guide for verbal MWEs, the guide for NE and the guide for other MWEs.
|
|
|
|
|
|
<!--concrete unica like the *moon* (or the *Moon*) are widely debated. Kleiber (2007) argues that unica terms necessarily name unique entities, whereas this is not the case for entity names (cf. supra ambiguity of entity names). Kleiber (1995) argues that the moon is viewed as a unique entity, whereas Mars is a name that serves to identify a particular planet within the class of planets. While these arguments seem arbitrary to us, -->
|
|
|
|
|
|
|
|
|
<!--De cette différence découle que l’on peut avoir un intérêt à coder un nom de concept dans un lexique, mais moins un nom d’entité spécifique. Voir Kleiber 2007 "utilité linguistique plus restreinte ou « privée » pour les noms propres" -->
|
... | ... | @@ -79,7 +78,7 @@ There are two types of candidates for a potential annotation: |
|
|
|
|
|
Note for some candidates, it might be unclear at the beginning whether they will be tagged as named entity or MWE, and what is their exact span. The annotators should decide using the decision tree.
|
|
|
|
|
|
### decision tree
|
|
|
### Decision tree
|
|
|
|
|
|
For a given candidate expression *c*:
|
|
|
|
... | ... | @@ -99,8 +98,7 @@ For a given candidate expression *c*: |
|
|
- *Une **arme blanche** est une arme tranchante, perforante ou contondante dont la mise en œuvre n'est due qu'à la force humaine…*
|
|
|
- *Le **conseil départemental** est l'assemblée délibérante d'un département*
|
|
|
- Use of a plural to refer to all objects of the class defined by *c*:
|
|
|
- ***Edged weapons** are prohibited in a plane*
|
|
|
<!--Les armes blanches sont interdites dans un avion-->
|
|
|
- ***Edged weapons** are prohibited in a plane* <!--Les armes blanches sont interdites dans un avion-->
|
|
|
- ***Red-haired people** are rare*
|
|
|
- ***tables made of wood** last longer*
|
|
|
- Use of a plural to refer to several objects of a class:
|
... | ... | @@ -143,7 +141,8 @@ For a given candidate expression *c*: |
|
|
|
|
|
<!--Nous mettons aussi à disposition une liste de [cas difficiles tranchés grâce aux critères](cas_deja_traites)-->
|
|
|
|
|
|
<!--La marche à suivre générale pour annoter est la suivante:
|
|
|
<!--ANCIENNE VERSION de l'arbre chapeau (modifié juin 2019)
|
|
|
La marche à suivre générale pour annoter est la suivante:
|
|
|
|
|
|
**Pour une séquence de plusieurs tokens, pour laquelle on a l'intuition que le sens de l'expression est obtenu de manière idiosyncratique et/ou qu'il y a sélection non libre des parties (au niveau morphologique ou lexical, des substitutions normalement faisables ne sont pas possibles ou produisent un changement de sens inattendu), on suit l'arbre suivant:**
|
|
|
|
... | ... | |