Skip to content
Snippets Groups Projects
Select Git revision
  • 7a7cbb6b76b6d7f0142a1ed57e8a51fb66b8ab0e
  • master default protected
2 results

README.md

Blame
  • Topic classifier for biomedical articles

    Multilabel topic classifier for medical articles.

    This system learns a topic classifier based for articles labelelled with multiple topics. The included model uses a variant of BERT pre-trained on medical texts, and finetunes it on task instances.

    Data

    Input data is expected to be a json-formatted file containing a list of articles. Each article should have a title, an abstract and a topics field containing a list of topics.

    Installing

    virtualenv -p python3 env
    source env/bin/activate
    pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

    Training

    python trainier.py [options]
    
    optional arguments:
      -h, --help            show this help message and exit
      --gpus GPUS
      --nodes NODES
      --name NAME
      --fast_dev_run
      --train_filename TRAIN_FILENAME
      --learning_rate LEARNING_RATE
      --batch_size BATCH_SIZE
      --epochs EPOCHS
      --valid_size VALID_SIZE
      --max_len MAX_LEN
      --bert_flavor BERT_FLAVOR
      --selected_features SELECTED_FEATURES

    Example training command line:

    python trainer.py --gpus=-1 --name test1 --train_filename ../scrappers/data/20200529/litcovid.json

    pytorch-lightning provides a tensorboard logger. You can check it with

    tensorboard --logdir lightning_logs

    Then point your browser to http://localhost:6006/.

    Generating predictions

    predict.py --checkpoint checkpoints/epoch\=0-val_loss\=0.2044.ckpt --test_filename ../scrappers/data/20200529/cord19-metadata.json > predicted.json