Skip to content
Snippets Groups Projects
Commit 8f9d71d4 authored by Benoit Favre's avatar Benoit Favre
Browse files

add info on sources

parent a82050a9
No related branches found
No related tags found
No related merge requests found
...@@ -19,3 +19,16 @@ Designed to be run at most once a day. ...@@ -19,3 +19,16 @@ Designed to be run at most once a day.
``` ```
./run.sh ./run.sh
``` ```
Sources
-------
* litcovid: NIH-curated list of COVID-19 articles (https://www.ncbi.nlm.nih.gov/research/coronavirus/)
Labels (8): General Information, Mechanism, Transmission, Diagnosis, Treatment, Prevention, Case Report, Epidemic Forecasting
Note that topic labels are semi-automatically assigned. See for details https://www.ncbi.nlm.nih.gov/research/coronavirus/faq
* bibliovid: Paper categories and fine-grained analysis by experts (https://bibliovid.org/)
Labels (7): Autres, Diagnostique, Thérapeutique, Épidémiologique, Pronostique, Recommandations, Modélisation
Labels (19): Hépato-gastro-entérologie, Neurologie, Cardiologie et maladies métaboliques, Hématologie, Gériatrie, Infectiologie, Gynécologie Obstétrique, Dermatologie, Pédiatrie, Pneumologie, Transversale, Psychiatrie, Virologie, Anesthésie-Réanimation, Radiologie, Hygiène, Néphrologie, Confinement/Déconfinement, Immunité
* CORD-19 metadata: large set of papers metadata selected with broad queries on general coronavirus research
...@@ -8,6 +8,10 @@ from datetime import datetime, date ...@@ -8,6 +8,10 @@ from datetime import datetime, date
pubmed = PubMed(tool="https://covid19.lis-lab.fr", email="benoit.favre@univ-amu.fr") pubmed = PubMed(tool="https://covid19.lis-lab.fr", email="benoit.favre@univ-amu.fr")
if len(sys.argv) != 3:
print('usage: %s <input> <output>' % sys.argv[0], file=sys.stderr)
sys.exit(1)
with open(sys.argv[1]) as fp: with open(sys.argv[1]) as fp:
articles = json.loads(fp.read()) articles = json.loads(fp.read())
...@@ -43,7 +47,8 @@ for article in articles['results']: ...@@ -43,7 +47,8 @@ for article in articles['results']:
if not found: if not found:
print('NOT FOUND:', title, file=sys.stderr) print('NOT FOUND:', title, file=sys.stderr)
print(json.dumps(articles, indent=2)) with open(sys.argv[2], 'w') as fp:
fp.write(json.dumps(articles, indent=2))
print('TOTAL', len(articles['results']), file=sys.stderr) print('TOTAL', len(articles['results']), file=sys.stderr)
for key, value in stats.items(): for key, value in stats.items():
......
...@@ -24,7 +24,7 @@ python "$dir/litcovid_add_abstract.py" "$out/litcovid_stage1.json" > "$out/litco ...@@ -24,7 +24,7 @@ python "$dir/litcovid_add_abstract.py" "$out/litcovid_stage1.json" > "$out/litco
count=`curl 'https://bibliovid.org/api/v1/posts?format=json' | python -mjson.tool | grep '"count":' | grep -o '[0-9]*'` count=`curl 'https://bibliovid.org/api/v1/posts?format=json' | python -mjson.tool | grep '"count":' | grep -o '[0-9]*'`
curl "https://bibliovid.org/api/v1/posts?format=json&offset=0&limit=$count" | python -mjson.tool > "$out/bibliovid_stage1.json" curl "https://bibliovid.org/api/v1/posts?format=json&offset=0&limit=$count" | python -mjson.tool > "$out/bibliovid_stage1.json"
python "$dir/bibliovid_scrapper.py" "$out/bibliovid_stage1.json" > "$out/bibliovid_stage2.json" python "$dir/bibliovid_scrapper.py" "$out/bibliovid_stage1.json" > "$out/bibliovid_stage2.json"
python "$dir/bibliovid_add_abstract.py" "$out/bibliovid_stage2.json" > "$out/bibliovid_stage3.json" python "$dir/bibliovid_add_abstract.py" "$out/bibliovid_stage2.json" "$out/bibliovid_stage3.json"
python "$dir/bibliovid_normalize.py" "$out/bibliovid_stage3.json" > "$out/bibliovid.json" python "$dir/bibliovid_normalize.py" "$out/bibliovid_stage3.json" > "$out/bibliovid.json"
# cleanup # cleanup
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment