Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 25, 2021
Open Peer Review Period: Jan 20, 2021 - Mar 17, 2021
Date Accepted: Mar 21, 2021
Date Submitted to PubMed: Apr 9, 2021
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

Viana dos Santos Santana , C. M. da Silveira A, Sobrinho , Chaves e Silva L, Dias da Silva L, Freire de Souza Santos D, Candeia E, Perkusich A

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

J Med Internet Res 2021;23(4):e27293

DOI: 10.2196/27293

PMID: 33750734

PMCID: 8034680

Machine Learning Classification Models for COVID-19 Test Prioritization in Brazil

  • Íris Viana dos Santos Santana; 
  • Andressa C. M. da Silveira; 
  • Álvaro Sobrinho; 
  • Lenardo Chaves e Silva; 
  • Leandro Dias da Silva; 
  • Danilo Freire de Souza Santos; 
  • Edmar Candeia; 
  • Angelo Perkusich

ABSTRACT

Background:

controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities.

Objective:

to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms.

Methods:

firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance.

Results:

the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache.

Conclusions:

supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.


 Citation

Please cite as:

Viana dos Santos Santana , C. M. da Silveira A, Sobrinho , Chaves e Silva L, Dias da Silva L, Freire de Souza Santos D, Candeia E, Perkusich A

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

J Med Internet Res 2021;23(4):e27293

DOI: 10.2196/27293

PMID: 33750734

PMCID: 8034680

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement