Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 3, 2021
Date Accepted: Aug 5, 2021
Date Submitted to PubMed: Aug 12, 2021

The final, peer-reviewed published version of this preprint can be found here:

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Teodoro D, Ferdowsi S, Borissov N, Kashani E, Vicente Alvarez D, Copara J, Gouareb R, Naderi N, Amini P

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

J Med Internet Res 2021;23(9):e30161

DOI: 10.2196/30161

PMID: 34375298

PMCID: 8451964

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

  • Douglas Teodoro; 
  • Sohrab Ferdowsi; 
  • Nikolay Borissov; 
  • Elham Kashani; 
  • David Vicente Alvarez; 
  • Jenny Copara; 
  • Racha Gouareb; 
  • Nona Naderi; 
  • Poorya Amini

Background:

The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses.

Objective:

In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language.

Methods:

Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents.

Results:

The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20.

Conclusions:

These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.


 Citation

Please cite as:

Teodoro D, Ferdowsi S, Borissov N, Kashani E, Vicente Alvarez D, Copara J, Gouareb R, Naderi N, Amini P

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

J Med Internet Res 2021;23(9):e30161

DOI: 10.2196/30161

PMID: 34375298

PMCID: 8451964

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement