add info on sources

......@@ -19,3 +19,16 @@ Designed to be run at most once a day.
* litcovid: NIH-curated list of COVID-19 articles (
Labels (8): General Information, Mechanism, Transmission, Diagnosis, Treatment, Prevention, Case Report, Epidemic Forecasting
Note that topic labels are semi-automatically assigned. See for details
* bibliovid: Paper categories and fine-grained analysis by experts (
Labels (7): Autres, Diagnostique, Thérapeutique, Épidémiologique, Pronostique, Recommandations, Modélisation
Labels (19): Hépato-gastro-entérologie, Neurologie, Cardiologie et maladies métaboliques, Hématologie, Gériatrie, Infectiologie, Gynécologie Obstétrique, Dermatologie, Pédiatrie, Pneumologie, Transversale, Psychiatrie, Virologie, Anesthésie-Réanimation, Radiologie, Hygiène, Néphrologie, Confinement/Déconfinement, Immunité
* CORD-19 metadata: large set of papers metadata selected with broad queries on general coronavirus research
......@@ -8,6 +8,10 @@ from datetime import datetime, date
pubmed = PubMed(tool="", email="")
if len(sys.argv) != 3:
print('usage: %s <input> <output>' % sys.argv[0], file=sys.stderr)
with open(sys.argv[1]) as fp:
articles = json.loads(
......@@ -43,7 +47,8 @@ for article in articles['results']:
if not found:
print('NOT FOUND:', title, file=sys.stderr)
print(json.dumps(articles, indent=2))
with open(sys.argv[2], 'w') as fp:
fp.write(json.dumps(articles, indent=2))
print('TOTAL', len(articles['results']), file=sys.stderr)
for key, value in stats.items():
......@@ -24,7 +24,7 @@ python "$dir/" "$out/litcovid_stage1.json" > "$out/litco
count=`curl '' | python -mjson.tool | grep '"count":' | grep -o '[0-9]*'`
curl "$count" | python -mjson.tool > "$out/bibliovid_stage1.json"
python "$dir/" "$out/bibliovid_stage1.json" > "$out/bibliovid_stage2.json"
python "$dir/" "$out/bibliovid_stage2.json" > "$out/bibliovid_stage3.json"
python "$dir/" "$out/bibliovid_stage2.json" "$out/bibliovid_stage3.json"
python "$dir/" "$out/bibliovid_stage3.json" > "$out/bibliovid.json"
# cleanup
