Adding data and README

3252a942 · Elie Antoine · 3252a942 · 3252a942
Commit 3252a942 authored 9 months ago by Elie Antoine
--- a/README.md
+++ b/README.md
+# README
+## JSON Structure
+The JSON object contains the following fields:
+- `lu`: The linguistic unit (LU) in the sentence.
+- `pos_lu`: The part of speech tag of the LU. (**corresponding to *f<sub>trigger</sub>***).
+- `lemma_lu`: The lemma or root form of the LU.
+- `frame`: The semantic frame associated with the LU.
+- `question`: The text of the question.
+- `id`: A unique identifier for the question-answer pair.
+- `answers`: A list of dictionaries containing the reference answers with the following fields:
+  - `text`: The text of the reference answer.
+  - `role`: The semantic role of the answer.
+  - `answer_start`: The starting character offset of the answer in the context.
+  - `answer_end`: The ending character offset of the answer in the context.
+  - `coref`: A dictionary for coreference information, with `anchor` and `mentions` fields.
+  - `wrong_answer`: The incorrect reference answer if there was a correction made.
+- `predictions`: A dictionary containing model predictions and corresponding ROUGE-L scores. Each model has an entry with:
+  - `answer_pred`: The predicted answer by the model.
+  - `rougeL`: The ROUGE-L score of the prediction.
+- `human_annot`: A dictionary containing human annotations for each model's output. Each model has an entry which is a list of annotations:
+  - `annot`: The annotation identifier.
+  - `rating`: The rating given by the human annotator (e.g., "Correct").
+- `lu_in_question`: Boolean corresponding to whether the trigger is found in the question for this example (**corresponding to *f<sub>LU in q</sub>***).
+- `nb_fe_frame`: Number of Frame Elements in the frame that triggered the question. (**corresponding to *f<sub>nb FEs</sub>***).
+- `list_dep_lu_ans`: Detail of dependencies crossed between response and frame trigger.
+- `nb_arc_lu_ans`: Number of dependency arcs between the answer and the trigger of the question's frame. (**corresponding to *f<sub>dist</sub>***).
+- `entropy_frame`: Entropy of the question's frame, common to all the examples of this frame. (**corresponding to *f<sub>entropy</sub>***).
+- `complexity_vector` : Each element corresponds to a complexity factor, 1 if it's "active" and the example therefore corresponds to the difficult group, 0 otherwise. Indexes correspond to the following complexity factors: 
+  - `0`: ***f<sub>LU in q</sub>***
+  - `1`: ***f<sub>trigger/sub>***
+  - `2`: ***f<sub>dist</sub>***
+  - `3`: ***f<sub>entropy</sub>***
+  - `4`: ***f<sub>nb FEs</sub>***
+## Example
+Here is an example of the JSON structure:
+```json
+{
+  "lu": "devient",
+  "pos_lu": "AUXE",
+  "lemma_lu": "devenir",
+  "frame": "Becoming",
+  "question": "Quel type d'État devient l'Irlande ?",
+  "id": "8abee7c1-e632-4168-8a9c-225eb7e15f43",
+  "answers": [
+    {
+      "text": "un état souverain et indépendant",
+      "role": "Final_category",
+      "answer_start": 279,
+      "answer_end": 311,
+      "coref": {
+        "anchor": {},
+        "mentions": []
+      },
+      "wrong_answer": ""
+    }
+  ],
+  "predictions": {
+    "MT5-large_260_AP0": {
+      "answer_pred": "état souverain et indépendant",
+      "rougeL": 1.0
+    }
+  },
+  "human_annot": {
+    "MT5-large": [
+      {
+        "annot": "annot_1",
+        "rating": "Correct"
+      }
+    ]
+  },
+  "lu_in_question": true,
+  "nb_fe_frame": 2,
+  "list_dep_lu_ans": [
+    "conj",
+    "obj"
+  ],
+  "nb_arc_lu_ans": 1,
+  "complexity_vector": [
+    0,
+    1,
+    0,
+    0,
+    1
+  ]
+}
+```
\ No newline at end of file
--- a/calor_complexity.json
+++ b/calor_complexity.json