Data analytics for novel coronavirus disease

https://doi.org/10.1016/j.imu.2020.100374Get rights and content
Under a Creative Commons license
open access

Abstract

This paper describes different aspects of novel coronavirus disease (COVID-19), presents visualization of the spread of the infection, and discusses the potential applications of data analytics on this viral infection. Firstly, a literature survey is done on COVID-19 highlighting a number of factors including its origin, its similarity with previous coronaviruses, its transmission capacity, its symptoms, etc. Secondly, data analytics is applied on a dataset of Johns Hopkins University to find out the spread of the viral infection. It is shown here that although the disease started in China in December 2019, the highest number of confirmed cases up to June 04, 2020 is in the USA. Thirdly, the worldwide increase in the number of confirmed cases over time is modelled here using a polynomial regression algorithm with degree 2. Fourthly, classification algorithms are applied on a dataset of 5644 samples provided by Hospital Israelita Albert Einstein of Brazil in order to diagnose COVID-19. It is shown here that multilayer perceptron (MLP), XGBoost and logistic regression can classify COVID-19 patients at an accuracy above 91%. Finally, a discussion is presented on the potential applications of data analytics in several important factors of COVID-19.

Keywords

Coronavirus
COVID-19
Classification
Machine learning
Regression
SARS-CoV-2

Cited by (0)