OnToNotes
Here are the data used for the experiment on the OnToNotes dataset for the task inclusion estimation (ACL 2025). You can find the paper here~:
Organisation of the repository
This folder contains various versions of the OnToNotes dataset. This dataset contains various linguistic annotations at the sentence level. We transformed this annotation to obtain a document level annotation, and an annotation which fits better Instruct-type models.
-
onto_json_name_summary_v1
. This folder contains linguistic annotations at the sentence level and summaries generated with chatGPT 3.5. -
corpus_onto_compress_1605
. This folder contains, in ajson
format, OnToNotes documents with their corresponding annotations. Summaries with various sizes are also availables.
In order to load the data, please refer to the code section, in which the link to the right repository is given.
Code for the ACL 2025 experiments
TBA !