The DEFT'23 shared task consists in answering pharma exam MCQs. This system converts the questions and possible answers to prompts and uses LLMs to generate answers.
The approach is described in our [paper](http://talnarchives.atala.org/CORIA-TALN/CORIA-TALN-2023/479307.pdf). It ranked 1st at the shared task.
This repository contains scripts to generate prompts, run off-the-shelf models and finetune the LLaMa models. It also contains the LoRA weights for the finetuned models.
This repository uses git LFS for large files.
This repository uses git LFS for large files.
Use 'git lfs install' before cloning to retrive the binary files.
Use 'git lfs install' before cloning to retrive the binary files.
...
@@ -9,11 +13,25 @@ Install:
...
@@ -9,11 +13,25 @@ Install:
pip install -r requirements.txt
pip install -r requirements.txt
```
```
Installing bitsandbytes for llama models is a bit more [involved](https://gitlab.lis-lab.fr/cluster/wiki/-/wikis/Compiling%20bitsandbytes%20for%20int8%20inference).
Note that bitsandbytes may need to be recompiled to support your cuda version.
See RESULTS for the exact match results on the dev.
See RESULTS for the exact match results on the dev.
See runs.sh for how to generate runs.
See runs for how to generate runs.
Note that external APIs require API keys. Please rename api_keys.template.py to api_keys.py and set keys you need inside.
Note that external APIs require API keys. Please rename api_keys.template.py to api_keys.py and set keys you need inside.
Please cite the follwing paper:
```
@inproceedings{Favre:CORIA-TALN:2023,
author = "Favre, Benoit",
title = "LIS@DEFT'23 : les LLMs peuvent-ils r\'epondre \`a des QCM ? (a) oui; (b) non; (c) je ne sais pas.",
booktitle = "Actes de CORIA-TALN 2023. Actes du D\'efi Fouille de Textes@TALN2023",
month = "6",
year = "2023",
address = "Paris, France",
publisher = "Association pour le Traitement Automatique des Langues",