... | @@ -32,20 +32,26 @@ For **nominal multi-word expressions**, we use a primary distinction concerning |
... | @@ -32,20 +32,26 @@ For **nominal multi-word expressions**, we use a primary distinction concerning |
|
|
|
|
|
In this latter case, knowing the defining characteristics of the concept enables one to use it for future instances, without requiring to learn any new naming convention. This contrasts with entity names: in order to use the name *Anna Duval* for a new person, one needs to learn a new naming convention linking the name to this new person, and the characteristics of the person plays almost no role (to be precise, with such an example the name tells us the person should be a woman). Note that
|
|
In this latter case, knowing the defining characteristics of the concept enables one to use it for future instances, without requiring to learn any new naming convention. This contrasts with entity names: in order to use the name *Anna Duval* for a new person, one needs to learn a new naming convention linking the name to this new person, and the characteristics of the person plays almost no role (to be precise, with such an example the name tells us the person should be a woman). Note that
|
|
- an entity name may well be ambiguous (e.g. several people bearing the same name), the key differentiating trait between (1) and (2) concerns whether or not there must be a naming convention at the level of each entity (Kleiber, 2007)
|
|
- an entity name may well be ambiguous (e.g. several people bearing the same name), the key differentiating trait between (1) and (2) concerns whether or not there must be a naming convention at the level of each entity (Kleiber, 2007)
|
|
- for concept names of course there is also a naming convention (why use the noun *table* for a table), but it is defined at the level of the class of entities, not at the level of each entity.
|
|
- for concept names of course there is also a naming convention (why use the noun *table* for a table), but it is defined at the level of the class of entities, not at the level of each entity. In a given context, a NP headed by *table* may refer to a specific table *t*, but this is without any naming convention of this particular table.
|
|
|
|
|
|
This distinction between entity name and instantiable concept name is reminiscent of the proper noun versus common noun distinction, but this latter distinction is not so easy to define precisely. Of course, lexical items that are exclusively used for directly naming entities (e.g. the first and last names for people) are easily classified as proper nouns (sometimes called **pure proper nouns**). This is why Erhmann (2008) roughly defines proper nouns as the "désignation d’une entité précise par le biais d’une description dont le sens joue un rôle mineur par rapport à la dénomination, opérant directement, du référent" (the designation of a precise entity via a description whose meaning plays a minor role with respect to the denomination of the referent, which operates directly").
|
|
This distinction between entity name and instantiable concept name is reminiscent of the proper noun versus common noun distinction, but this latter distinction is not so easy to define precisely. Of course, lexical items that are exclusively used for directly naming entities (e.g. the first and last names for people) are easily classified as proper nouns (sometimes called **pure proper nouns**). This is why Erhmann (2008) roughly defines proper nouns as the "désignation d’une entité précise par le biais d’une description dont le sens joue un rôle mineur par rapport à la dénomination, opérant directement, du référent" (the designation of a precise entity via a description whose meaning plays a minor role with respect to the denomination of the referent, which operates directly").
|
|
But an abundant litterature shows that the proper / common noun distinction reveals difficult to characterize in linguistic terms (we refer primarily to (Kleiber, 2001;2007) and (Erhmann, 2008) for a state of the art). Indeed within entity names, we can distinguish:
|
|
But an abundant litterature shows that the proper / common noun distinction reveals difficult to characterize in linguistic terms (we refer primarily to (Kleiber, 2001;2007) and (Erhmann, 2008) for a state of the art). Indeed within names of specific entities, we can distinguish:
|
|
- **(1a)** entity names composed of lexical items that are dedicated to naming entities (to say it quickly: proper nouns), such as *Italy*, *Anna Duval*, *Microsoft*
|
|
- **(1a)** entity names composed of lexical items that are dedicated to naming entities (pure proper nouns), such as *Italy*, *Anna Duval*, *Microsoft*
|
|
- **(1b)** entity names that have a descriptive basis, such as the "International League against Racism and Anti-Semitism" or the "Massif central" (litterally the "central massif"): the naming convention between the entity and the name is sociologically typical of a proper noun (the name of an association, of a geographical item), but also clearly results from the compatibility of the entity characteristics and the meaning of the lexical items
|
|
- **(1b)** entity names that have a descriptive basis, such as the "International League against Racism and Anti-Semitism" or the "Massif central" (litterally the "central massif"): the naming convention between the entity and the name is sociologically typical of a proper noun (the name of an association, of a geographical item), but also clearly results from the compatibility of the entity characteristics and the meaning of the lexical items
|
|
- **(1c)** but also names which serve to designate unique abstract entities, such as abstract simple nouns ("taxidermy") or abstract MWEs (*Euclidean geometry*, *machine translation*), and names referring to unique concrete entities such as the sun or the moon (often called "unica"): because of the unicity of the entity that can be called that way, they too can be viewed as entity names, for which the speakers have to learn the naming convention.
|
|
- **(1c)** but also names which serve to designate unique abstract entities, such as abstract simple nouns (*taxidermy*) or abstract MWEs (*Euclidean geometry*, *machine translation*): because of the unicity of the entity that can be called that way, they too can be viewed as entity names, for which the speakers have to learn the naming convention.
|
|
|
|
|
|
Now the thing is that cases like unique abstract terms (*machine translation*) are traditionnally not viewed as proper nouns, and concrete unica like the moon are widely debated. Kleiber (2007) argues that unica terms necessarily name unique entities, whereas this is not the case for entity names (cf. supra ambiguity of entity names). Kleiber (1995) argues that the moon is viewed as a unique entity, whereas Mars is a name that serves to identify a particular planet within the class of planets. While these arguments seem arbitrary to us, we keep the tradition of considering (1b) cases as proper nouns, and (1c) cases as common nouns.
|
|
Now the thing is that cases (1d) are traditionnally not viewed as proper nouns. Kleiber (1996) argues that proper nouns function to name a particular entity within a specified class (a particular person within the class of persons). We keep this tradition of considering (1b) cases as proper nouns, and (1c) cases as common nouns.
|
|
|
|
|
|
|
|
Note there are also names referring to **unique concrete entities** such as the sun or the moon (often called "unica"), whose status is widely debated. We tag these as named entities, unless used to refer to an instance of a class: The moon will itconsider:
|
|
|
|
- when the Moon or the moon refers to the natural satellite of the Earth, we consider it as a proper noun and a named entity
|
|
|
|
- when it is used to refer to an instance of natural satellite (as in *several planets have moons*)
|
|
|
|
We find an intuitive distinction between cases (1b) and (1c), and concrete unica like the *moon* (or the *Moon*) are widely debated. Kleiber (2007) argues that unica terms necessarily name unique entities, whereas this is not the case for entity names (cf. supra ambiguity of entity names). Kleiber (1995) argues that the moon is viewed as a unique entity, whereas Mars is a name that serves to identify a particular planet within the class of planets. While these arguments seem arbitrary to us,
|
|
|
|
|
|
|
|
, and names referring to unique concrete entities such as the sun or the moon (often called "unica"):
|
|
|
|
|
|
Within PARSEME-FR, we have chosen to distinguish between:
|
|
Within PARSEME-FR, we have chosen to distinguish between:
|
|
- cases (1a)/(1b), which are generally considered in NLP as **named entities** (although the term is a bit confusing, cf. "named entity" should refer to the entity and not the name, we will use it, as usual in the NLP community, for entity names), and named entities are generally associated to semantic types (person, organization etc...). We annotate these as named entities (EN), using a dedicated guide, provided they are of the following semantic type: PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT, EVENT (as these happen to often be named with a pure proper noun).
|
|
- cases (1a)/(1b), which are generally considered in NLP as **named entities** and associated to a semantic type (person, organization etc...). Although the term is confusing (cf. the linguistic expression is an entity name, not a named entity) we will use it in the following, as usual in the NLP community, for entity names). We annotate these as named entities (EN), using a dedicated guide, provided they are of the following semantic type: PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT, EVENT (as these happen to often be named with a pure proper noun).
|
|
- moreover, for named entities, we do annotate the polylexical case (*Anna Duval*) but also single token entity names (*Italy*, *Anna*): indeed, from the applicative point of view, it would be a pity to ignore the latter.
|
|
- moreover, for named entities, we do annotate both the multiword case (*Anna Duval*) and the single token cases (*Italy*, *Anna*): indeed, from the applicative point of view, it would be a pity to ignore the latter.
|
|
- for cases (2) and (1c) (which are not intuitively considered proper nouns) we use another guide, and a MWE tag (to be understood as non NE multi-word expression).
|
|
- for cases (2) and (1c) (which are not intuitively considered proper nouns) we use another guide, and a MWE tag (to be understood as non NE multi-word expression).
|
|
|
|
|
|
It remains that these objects share some characteristics, and some tests are similar.
|
|
It remains that these objects share some characteristics, and some tests are similar.
|
... | | ... | |