Skip to content
Snippets Groups Projects
Select Git revision
  • main
1 result

autogestion-qa-linking

  • Name Last commit Last update
    datas
    README.md

    Extrinsic evaluation of question generation methods with user journey logs | HumEval @ LREC-COLING 2024

    Article

    Datas

    Generated questions

    Directories QUESTIONS_BARTHEZ and QUESTIONS_MISTRAL contain automatically generated questions with the following structure.

    QUESTIONS_*
    |-- Issue 1
    |-- Issue 2
    |   |-- Issue 2, Article 1 
    |   |   |-- Issue 2, Article 1, Page 1
    |   |   |-- Issue 2, Article 1, Page 2
    |   |-- Issue 2 (Article 2)
    |   |   |-- Issue 2, Article 2, Page 3
    |   |   |-- Issue 2, Article 2, Page 4
    |   |   |-- Issue 2, Article 2, Page 5
    |   |-- ...
    ... ... 

    /!\ It is important to note that page numbering is based on the Issue and not on the article itself.

    Each page correspond to a .json file in the following format:

    {
      "num": "autog_0005-0970_1968_num_7_1", (Current issue)
      "article": "autog_0005-0970_1968_num_7_1_932", (Current article)
      "page_name": "FMSH_PB188a_00007_039", (Current page name)
      "questions": [ (List of generated questions /!\ For BARThez, the list consists only of the text of the questions generated from the page)
        {
          "id_qa": "FMSH_PB188a_00007_039_qa28_v3", (Unique id representing the question)
          "id_block": "block_1", (Id of the paragraph in which the question was generated /!\ only for BARThez)
          "question": "A qui est destiné le projet de texte ?", (Generated question)
          "answer": "aux étudiants de sociologie", (Answer for which the question was generated /!\ only for BARThez)
          "answer_stringIds": [ (StringIds of the answer /!\ usefull to map it in the original datas)
            "string_17",
            "string_18",
            "string_19",
            "string_20",
          ]
        },

    Collection graphs

    Directories graphs contains the graph of the collection in the different forms studied in the article:

    • only_question_barthez : G_q in the article
    • only_question_mistral : G_mistral in the article
    • question_answer_barthez : G_qa in the article
    • textblocks : G_para in the article

    They can easily be reused with Netowrkx or other graph libraries, as follows:

    import networkx as nx
    G = nx.read_gexf("datas/graphs/only_question_barthez.gexf", relabel=True)

    User exploration path

    All articles browsed by users are stored in the json file "dict_article_user.json". It corresponds to a dictionary with each anonymised user as a key and the list of articles browsed as a value.

    /!\ Some rare articles visited are not present in the graphs or question files generated because no questions could be generated from them.