Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • P PARSEME-FR-public
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • PARSEME-FR
  • PARSEME-FR-public
  • Wiki
  • Corpus format description

Corpus format description · Changes

Page history
Update Corpus format description authored Oct 30, 2019 by Marie Candito's avatar Marie Candito
Hide whitespace changes
Inline Side-by-side
Corpus-format-description.md
View page @ bb691f52
......@@ -59,11 +59,11 @@ Similarly to _PARSEME:MWE_, the information in the 11th column called _PARSEME-F
Here is an example of sentence using the PARSEME-FR _cupt_ format described above,
showing only columns 1 (ID), 2 (FORM) and 11 (MWE / NE annotation).
showing only columns 1 (ID), 2 (FORM) and 11 (MWE / NE annotation). The sentence contains 8 MWEs/NEs, which we comment below:
E.g. "Peugeot" is annotated as a final ORG named entity (NE-ORG.final), with identifier 2, and also as a primary PERS named entity with identifier 1.
- id 1 and 2 : token 2 "Peugeot" is annotated as a final ORG named entity (NE-ORG.final), with identifier 2, and also as a primary PERS named entity with identifier 1.
"tout au plus" is annotated as a MWE, more precisely tokens "tout", "à", "le" and "plus" are annotated with identifier 3 ("au" is a multi-word token which is not annotated). It has "ADV" as part-of-speech, meaning it behaves as an adverb, but it is considered as irregular from the syntactic point of view. The criterion that was used to annotate it is "IRREG".
- id 3 : "tout au plus" is annotated as a MWE, more precisely tokens "tout", "à", "le" and "plus" are annotated with identifier 3 ("au" is a multi-word token which is not annotated). It has "ADV" as part-of-speech, meaning it behaves as an adverb, but it is considered as irregular from the syntactic point of view. The criterion that was used to annotate it is "IRREG", hence the annotation 3:ADV|MWE|IRREG on the first token of the MWE ("tout").
The sentence contains an example of a word (the support verb "effectuait") belonging to two LVCs: the LVC with id 6 contains tokens 21 and 23, and the LVC with id 7 contains the tokens 21 and 26.
......
Clone repository
  • CRAN
  • Corpus format description
  • Criteres lexicaux
  • Criteres morphosyntaxiques
  • Criteres semantiques
  • Criteres specifiques
  • Criteres
  • Criteres_adverbiaux
  • Guide annotation PARSEME_FR chapeau
  • IRREG
  • Interaction between syntactic annotation and MWE
    • Interaction between syntactic annotation and MWE status
  • cas_deja_traites
  • defis en
  • ep_et_en
  • Home
View All Pages