... | ... | @@ -18,7 +18,7 @@ This decision tree is entered from the [general decision tree](https://gitlab.li |
|
|
## Step 1 - identifying a naming convention
|
|
|
Let _c_ denote the candidate sequence identified while following the [general decision tree](https://gitlab.lis-lab.fr/PARSEME-FR/PARSEME-FR-public/wikis/Guide-annotation-PARSEME_FR-chapeau), and _t_ the text being annotated.
|
|
|
|
|
|
apply [ObviousProper](#test-2-obviousproper-obvious-proper-name)(_c_,_t_)
|
|
|
apply [ObviousProper](#test-1-obviousproper-obvious-proper-name)(_c_,_t_)
|
|
|
**YES** => _c_ is a fuzzy NE, go to [Step 3](#step-3-establishing-the-span-of-a-named-entity)
|
|
|
**NO** => apply [NameConv](#test-3-nameconv-naming-convention)(_c_,_t_)
|
|
|
preciseNE => c is a **NE with a precise span**
|
... | ... | @@ -89,6 +89,76 @@ Examples: |
|
|
* _QUID DE L'AFFAIRE DES DISPARUS DU BEACH_? - _AFFAIRE DES DISPARUS DU BEACH_ is spelled with an initial uppercase letter because the whole sequence is in uppercase. Test not passed.
|
|
|
* _Il a évoqué l'Affaire des disparus du Beach_ - _Affaire des disparus du Beach_ (EVE) is spelled with an initial upercase letter in the middle of a sentence, not for honorific reasons. We hypothesise that author signals a proper name. Test passed.
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 4 [Acron] - acronym
|
|
|
|
|
|
Does the candidate sequence have an acronym in the given text?
|
|
|
|
|
|
Examples:
|
|
|
* _Le Club Cynophile du Blaisois organise son concours demain. Toute l'équuipe du CCB vous attends._ - test passed for _Club Cynophile du Blaisois_ et _CCB_ (ORG).
|
|
|
* _Structures d’insertion par l’activité économique (SIAE) en Vendée_ - test passed for _Structures d’insertion par l’activité économique_ (no NE) but not for _Structures d’insertion par l’activité économique en Vendée_ (no NE)
|
|
|
* _L'insertion par l'activité économique (IAE) est une des composantes de ce que l'on appelle aujourd'hui l'Économie Sociale et Solidaire (ESS)_ - test passed for _insertion par l'activité économique_ (no NE) and _Économie Sociale et Solidaire_ (no NE).
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 5 [WebPage] - dedicated web page
|
|
|
|
|
|
Is there an official web page or a Wikipedia page titled by the candidate sequence?
|
|
|
|
|
|
Examples:
|
|
|
* _Mairie de Paris_ has a Wikipedia webpage titled with this precise sequence. Test passed.
|
|
|
* _Structures d’insertion par l’activité économique (SIAE) en Vendée_ - test passed for _Structures d’insertion par l’activité économique_ (there is a Wikipedia page) but not for _Structures d’insertion par l’activité économique en Vendée_
|
|
|
* _La fédération des entreprise d'insertion Pays de la Loire_ - it has its official webpage. Test passed.
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 6 [MinSpan] - minimal span
|
|
|
|
|
|
Does the candidate sequence _c_ have the minimal span, i.e. is it true that a shorter span than _c_ no longer refers to the same entity as _c_? Note that this test may be context-specific, e.g. inhabitants of Blois might say _aller à la République_ to mean _aller à la place de la République_ but this information is not available to the large population of French speakers. In case of doubts we suppose that the test is not passed.
|
|
|
|
|
|
Example:
|
|
|
* _Le Havre - La Rochelle en bus_ - _Le Havre_ and _La Rochelle_ cannot be referred to by _Havre_ and _Rochelle_. Test passed.
|
|
|
* _palais Jacques Coeur_ cannot be referred to as _Jacques Coeur_. Test passed.
|
|
|
* _place de l'Etoile_ can also be referred to as _l'Etoile_. Test not passed.
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 7 [SpanPerCat] - span per category
|
|
|
|
|
|
Note 1: This test does **not** apply to **any** NE of the given categories, it applies only when other more fine-grained tests have previously failed.
|
|
|
Note 2: According to the decision trees in [Step 3](#step-3-establishing-the-span-of-a-named-entity), this test is applied to candidate sequences for which the [MinSpan](#test-7-minspan-minimal-span) test failed, i.e. a shorted span can still refer to the same entity.
|
|
|
|
|
|
What is the **final type** of the candidate sequence _c_?
|
|
|
|
|
|
* **PERS** => exclude the classifier, e.g. _les frères [Dupond]_, _la famille [Champion]_, _le professeur [Władysław Strzemiński]_, _Mme [ de la Clairy]_, _le prédident [Giscard d'Estaing]_
|
|
|
|
|
|
* **LOC**:
|
|
|
* exclude the classifier if one of the following
|
|
|
+ région e.g. _la région [Ile-de-France]_
|
|
|
+ département e.g. _département [Indre-et-Loire]_
|
|
|
+ ville e.g. _la ville de [Clermont-Ferrand]_
|
|
|
* otherwise keep the classifier, e.g.
|
|
|
+ rue, avenue, degré, escalier e.g. [_rue de la Paix_], [_avenue Jean-Jaurès_], [_escalier Denis Papin_], [_degré Saint-Laumer_]
|
|
|
+ place, square, rond-point e.g. [_place Victor Hugo_], [_square Léon Blum_], [_rond-point Charles-de-Gaule_]
|
|
|
+ mer e.g. [_mer Baltique_], [_mer Égée_]
|
|
|
+ lac, étang e.g. [_lac Pavin_], [_étang Neuf_]
|
|
|
+ école e.g. [_école Notre-Dame_]
|
|
|
+ salle e.g. [_salle Jean-Mathieu_]
|
|
|
+ laiterie e.g. [_laiterie Besnier_], [_laiterie SOGECO_]
|
|
|
+ hôtel e.g. _l'hôtel [Formule 1]_
|
|
|
+ église e.g. [_église Notre-Dame_]
|
|
|
+ col e.g. [_col du Tourmalet_]
|
|
|
+ etc.
|
|
|
|
|
|
* **ORG**:
|
|
|
* exclude the classifier if one of the following
|
|
|
+ those that would be excluded for the primary type (if different from the final type), e.g. _la ville de [Clermont-Ferrand] a voté gauche_ (primary type: LOC, so _ville_ is excluded)
|
|
|
+ entreprise, e.g. _l'entreprise [Mc Kowal]_
|
|
|
+ société e.g. _la société [Lyon Tech]_
|
|
|
* otherwise keep the classifier, e.g.
|
|
|
* **PROD** => exclude the classifier, e.g. _le paquebot [Angelina Lauro]_
|
|
|
* **EVE** => exclude the classifier, e.g. _ouragan [El Niño]_
|
|
|
|
|
|
**Return** _c_.
|
|
|
|
|
|
|
|
|
<!--------------------------------------------------------------------------------------------->
|
|
|
<!--------------------------------------------------------------------------------------------->
|
|
|
<!--------------------------------------------------------------------------------------------->
|
... | ... | @@ -203,73 +273,3 @@ Examples: |
|
|
-->
|
|
|
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 5 [Acron] - acronym
|
|
|
|
|
|
Does the candidate sequence have an acronym in the given text?
|
|
|
|
|
|
Examples:
|
|
|
* _Le Club Cynophile du Blaisois organise son concours demain. Toute l'équuipe du CCB vous attends._ - test passed for _Club Cynophile du Blaisois_ et _CCB_ (ORG).
|
|
|
* _Structures d’insertion par l’activité économique (SIAE) en Vendée_ - test passed for _Structures d’insertion par l’activité économique_ (no NE) but not for _Structures d’insertion par l’activité économique en Vendée_ (no NE)
|
|
|
* _L'insertion par l'activité économique (IAE) est une des composantes de ce que l'on appelle aujourd'hui l'Économie Sociale et Solidaire (ESS)_ - test passed for _insertion par l'activité économique_ (no NE) and _Économie Sociale et Solidaire_ (no NE).
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 6 [WebPage] - dedicated web page
|
|
|
|
|
|
Is there an official web page or a Wikipedia page titled by the candidate sequence?
|
|
|
|
|
|
Examples:
|
|
|
* _Mairie de Paris_ has a Wikipedia webpage titled with this precise sequence. Test passed.
|
|
|
* _Structures d’insertion par l’activité économique (SIAE) en Vendée_ - test passed for _Structures d’insertion par l’activité économique_ (there is a Wikipedia page) but not for _Structures d’insertion par l’activité économique en Vendée_
|
|
|
* _La fédération des entreprise d'insertion Pays de la Loire_ - it has its official webpage. Test passed.
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 7 [MinSpan] - minimal span
|
|
|
|
|
|
Does the candidate sequence _c_ have the minimal span, i.e. is it true that a shorter span than _c_ no longer refers to the same entity as _c_? Note that this test may be context-specific, e.g. inhabitants of Blois might say _aller à la République_ to mean _aller à la place de la République_ but this information is not available to the large population of French speakers. In case of doubts we suppose that the test is not passed.
|
|
|
|
|
|
Example:
|
|
|
* _Le Havre - La Rochelle en bus_ - _Le Havre_ and _La Rochelle_ cannot be referred to by _Havre_ and _Rochelle_. Test passed.
|
|
|
* _palais Jacques Coeur_ cannot be referred to as _Jacques Coeur_. Test passed.
|
|
|
* _place de l'Etoile_ can also be referred to as _l'Etoile_. Test not passed.
|
|
|
|
|
|
------------------------------------------------
|
|
|
### Test 8 [SpanPerCat] - span per category
|
|
|
|
|
|
Note 1: This test does **not** apply to **any** NE of the given categories, it applies only when other more fine-grained tests have previously failed.
|
|
|
Note 2: According to the decision trees in [Step 3](#step-3-establishing-the-span-of-a-named-entity), this test is applied to candidate sequences for which the [MinSpan](#test-7-minspan-minimal-span) test failed, i.e. a shorted span can still refer to the same entity.
|
|
|
|
|
|
What is the **final type** of the candidate sequence _c_?
|
|
|
|
|
|
* **PERS** => exclude the classifier, e.g. _les frères [Dupond]_, _la famille [Champion]_, _le professeur [Władysław Strzemiński]_, _Mme [ de la Clairy]_, _le prédident [Giscard d'Estaing]_
|
|
|
|
|
|
* **LOC**:
|
|
|
* exclude the classifier if one of the following
|
|
|
+ région e.g. _la région [Ile-de-France]_
|
|
|
+ département e.g. _département [Indre-et-Loire]_
|
|
|
+ ville e.g. _la ville de [Clermont-Ferrand]_
|
|
|
* otherwise keep the classifier, e.g.
|
|
|
+ rue, avenue, degré, escalier e.g. [_rue de la Paix_], [_avenue Jean-Jaurès_], [_escalier Denis Papin_], [_degré Saint-Laumer_]
|
|
|
+ place, square, rond-point e.g. [_place Victor Hugo_], [_square Léon Blum_], [_rond-point Charles-de-Gaule_]
|
|
|
+ mer e.g. [_mer Baltique_], [_mer Égée_]
|
|
|
+ lac, étang e.g. [_lac Pavin_], [_étang Neuf_]
|
|
|
+ école e.g. [_école Notre-Dame_]
|
|
|
+ salle e.g. [_salle Jean-Mathieu_]
|
|
|
+ laiterie e.g. [_laiterie Besnier_], [_laiterie SOGECO_]
|
|
|
+ hôtel e.g. _l'hôtel [Formule 1]_
|
|
|
+ église e.g. [_église Notre-Dame_]
|
|
|
+ col e.g. [_col du Tourmalet_]
|
|
|
+ etc.
|
|
|
|
|
|
* **ORG**:
|
|
|
* exclude the classifier if one of the following
|
|
|
+ those that would be excluded for the primary type (if different from the final type), e.g. _la ville de [Clermont-Ferrand] a voté gauche_ (primary type: LOC, so _ville_ is excluded)
|
|
|
+ entreprise, e.g. _l'entreprise [Mc Kowal]_
|
|
|
+ société e.g. _la société [Lyon Tech]_
|
|
|
* otherwise keep the classifier, e.g.
|
|
|
* **PROD** => exclude the classifier, e.g. _le paquebot [Angelina Lauro]_
|
|
|
* **EVE** => exclude the classifier, e.g. _ouragan [El Niño]_
|
|
|
|
|
|
**Return** _c_.
|
|
|
|
|
|
|