Extrinsic evaluation of question generation methods with user journey logs | HumEval @ LREC-COLING 2024
Article
Datas
Generated questions
Directories QUESTIONS_BARTHEZ and QUESTIONS_MISTRAL contain automatically generated questions with the following structure.
QUESTIONS_*
|-- Issue 1
|-- Issue 2
| |-- Issue 2, Article 1
| | |-- Issue 2, Article 1, Page 1
| | |-- Issue 2, Article 1, Page 2
| |-- Issue 2 (Article 2)
| | |-- Issue 2, Article 2, Page 3
| | |-- Issue 2, Article 2, Page 4
| | |-- Issue 2, Article 2, Page 5
| |-- ...
... ...
/!\ It is important to note that page numbering is based on the Issue and not on the article itself.
Each page correspond to a .json file in the following format:
{
"num": "autog_0005-0970_1968_num_7_1", (Current issue)
"article": "autog_0005-0970_1968_num_7_1_932", (Current article)
"page_name": "FMSH_PB188a_00007_039", (Current page name)
"questions": [ (List of generated questions /!\ For BARThez, the list consists only of the text of the questions generated from the page)
{
"id_qa": "FMSH_PB188a_00007_039_qa28_v3", (Unique id representing the question)
"id_block": "block_1", (Id of the paragraph in which the question was generated /!\ only for BARThez)
"question": "A qui est destiné le projet de texte ?", (Generated question)
"answer": "aux étudiants de sociologie", (Answer for which the question was generated /!\ only for BARThez)
"answer_stringIds": [ (StringIds of the answer /!\ usefull to map it in the original datas)
"string_17",
"string_18",
"string_19",
"string_20",
]
},
Collection graphs
Directories graphs contains the graph of the collection in the different forms studied in the article:
- only_question_barthez : G_q in the article
- only_question_mistral : G_mistral in the article
- question_answer_barthez : G_qa in the article
- textblocks : G_para in the article
They can easily be reused with Netowrkx or other graph libraries, as follows:
import networkx as nx
G = nx.read_gexf("datas/graphs/only_question_barthez.gexf", relabel=True)
User exploration path
All articles browsed by users are stored in the json file "dict_article_user.json". It corresponds to a dictionary with each anonymised user as a key and the list of articles browsed as a value.
/!\ Some rare articles visited are not present in the graphs or question files generated because no questions could be generated from them.