JMIR Preprints #27293: Machine Learning Classification Models for COVID-19 Test Prioritization in Brazil

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Machine Learning Classification Models for COVID-19 Test Prioritization in Brazil

Íris Viana dos Santos Santana;
Andressa C. M. da Silveira;
Álvaro Sobrinho;
Lenardo Chaves e Silva;
Leandro Dias da Silva;
Danilo Freire de Souza Santos;
Edmar Candeia;
Angelo Perkusich

ABSTRACT

Background:

controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities.

Objective:

to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms.

Methods:

firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance.

Results:

the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache.

Conclusions:

supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.

Citation

Please cite as:

Viana dos Santos Santana �, C. M. da Silveira A, Sobrinho �, Chaves e Silva L, Dias da Silva L, Freire de Souza Santos D, Candeia E, Perkusich A

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

J Med Internet Res 2021;23(4):e27293

DOI: 10.2196/27293

PMID: 33750734

PMCID: 8034680

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 25, 2021

Open Peer Review Period: Jan 20, 2021 - Mar 17, 2021

Date Accepted: Mar 21, 2021

Date Submitted to PubMed: Apr 9, 2021

(closed for review but you can still tweet)

Machine Learning Classification Models for COVID-19 Test Prioritization in Brazil

ABSTRACT

Citation

Copyright