1 Introduction

The World Health Organization declared that new coronavirus disease 2019 (COVID-19) was a Public Health Emergency of International Concern on January 30th 2020 [1, 2]. By then there were a total number of 7818 confirmed cases of COVID-19 globally with more than 1370 severe cases and 170 deaths. The bulk of which was found in China [3]. Over the course of a few weeks the disease has propagated across the boundaries of China infecting nearly every country. At the time of writing this paper (May 01, 2020) there is a total of 2,397,216 confirmed cases globally with 162,956 deaths [4]. Symptoms of the disease include dry cough, sore throat, and fever. Although the majority of the cases are mild, some cases could lead to Acute Respiratory Distress Syndrome (ARDS), severe pneumonia, pulmonary oedema, and organ failure [5]. After the emergency declaration of WHO, several works have been done in the terms of modeling and prediction to try and provide ways to either understand the disease propagation, evaluate preventive measure put in place by authorities, provide early and accurate detection of the disease just to name a few. Mathematical modeling has been used for several years in epidemiological studies [6]. Mathematical modeling of disease transmission and propagation helps in the prediction of the course of epidemics, the design of mass vaccination programs and also it can provide guidance on what type of data are relevant in the study of the epidemics [7]. Some of the studies carried out in regards to the current COVID-19 include modeling of the dynamic of COVID-19, exploring the effect of prevention method like travel restriction of COVID-19 and studying the effect of climate on the COVID-19 propagation [8]. On the other hand, artificial intelligence (AI) is a tool used for prediction. AI is the study and development of algorithms (machines) that mimic human intelligence. AI has been successfully used in a several fields such as computer vision, online advertising, spam filtering, robotics, fraud detection and so on [9, 10]. In healthcare, AI has also gained attention in terms of disease detection, treatment selection, patient monitoring, drug discovery, gene function annotation, automated experiments, automated data collection etc. [11, 12]. As to what concerns the COVID-19, AI has been used in medical image acquisition, image segmentation and diagnosis [13]. In this paper, a review of the mathematical modeling and artificial intelligence used in the study, estimation and prediction of COVID-19 is presented. The paper is divided into three parts, the first presents the mathematical models used in the study of the pandemic, the second presents the various AI applications in disease diagnosis and estimation and in the third part a list of available datasets for COVID-19 is presented.

2 Material and method

The review is divided into three parts each dealing with a specific aspect like Mathematical modeling, AI applications and available datasets. For each of the three parts, the items reviewed were grouped into topics and then a summary of each group is done. In all a total number of 61 journal articles, reports, fact sheets and websites were reviewed. The items reviewed were all published between December 2019 to April 2020. Table 1 shows the structure of the review including the number of items reviewed and the main focus of the reviewed items.

Table 1 The breakdown of the review showing number of items covered per part

3 Mathematical modeling and COVID-19

Various research works were developed in literature for the modeling of dynamics and spread of COVID-19. Most of these were particularly based on the Susceptible-Exposed-Infected-Removed (SEIR) model and the Susceptible-infected-recovered (SIR) model. These models were largely used in the past for the study of epidemic spreading with various forms of networks of transmission [14,15,16,17,18,19,20,21,22]. Table 2 gives the summary of the various models used in COVID-19 studies. The following gives a review of these models.

Table 2 Summary of the various mathematical models used in COVID-19 studies

3.1 Susceptible-exposed-infected-removed (SEIR)

Choujun et al. [23] used daily intercity migration data together with a SEIR model to generate a new model that describes the dynamics of COVID-19 in China. They collected the daily intercity migration data form 367 cities using a mobile application that tracks human migration. They concluded that the number of infections in most cities in China would be highest between the middle of February to early March 2020. Anca and Kieran adapted a traditional SEIR model to study the specific dynamic compartments and epidemic parameters of COVID-19 [24]. They analyzed the current management strategy of the pandemic, including social distancing, travel bans, and service interruptions and closures for the generation of predictions, and assessment of the efficiency of these control measures. In [25], the combination of SEIR and regression models was used with John Hopkins University dataset on COVID-19 for the prediction of the change in the spreading of COVID-19. The study presented in [26] used an age-structured susceptible-exposed-infected-removed (SEIR) model for physical distancing measurement and evaluation. The authors showed that physical distancing measures were most effective if the gradual return to work started in April. The study of the transmission of the COVID-19 and its association with temperature and humidity using the SEIR model was initiated by Xiao-Jing et al. [27]. The outcomes of the study presented that raising the temperature and humidity values contributed to the control of transmission of the disease. In [28], the SEIR model was adapted to investigate the potential community-wide impact of public use of face masks on the transmission dynamics and control of the COVID-19 pandemic. It was suggested that face masks should be used nation-wide and implemented immediately (Table 3).

Table 3 Summary of the classifications methods used in COVID-19 studies

3.2 Susceptible-lovered (SIR)

A time-dependent susceptible-infected-recovered (SIR) model to track the transmission rate and the recovering rate at a particular time was proposed in [29]. They obtained a prediction error of 3% or less for confirmed cases and predicted that the day the recovering rate over took the transmission rate was on February 17, 2020 in the Hubei province of China. Wang et al. [30] modified the SIR model by adding different types of time-varying quarantine strategies such as government imposed mass isolation policies and micro-inspection measures at the community level to establish a method of calibrating cases of under-reported infections. The SIR model was also used to fit the cumulative data of COVID-19 to an empirical form in China [31]. It was reported that for given parameter values, the SIR model on the Euclidean network obtained high accuracy on data form China and predict when the pandemic would be expected to be over. In [32], a simple age-sensitive SIR model, which integrated known age-interaction contact patterns for the examination of potential effects of age-heterogeneous mitigations on an epidemic in a COVID-19-like parameter regime was studied. Authors found that strict age-targeted mitigation strategies had the potential to reduce mortalities. The age-structured SIR model with social contact matrices and Bayesian imputation was studied to evaluate the progress of the pandemic in India [33]. The authors evaluated the influence of social distancing measures like workplace non-attendance, and school closure on the transmission of the novel Corona virus. It was found that a three-week lockdown would be insufficient to prevent the spread of the disease. A simple SIR model modified to include certain variables of containment measures taken worldwide was used to study these measures [34]. By comparing various scenarios, it was shown that the infection progress strongly affected by the measures taken.

3.3 Other models

A Susceptible-Infectious-Quarantined-Recovered (SIQR) model for the analysis of data in Brazil was used [35]. It was found that the number of quarantined individuals grew exponentially and stabilized. The SEIQR (Susceptible-Exposed-Infectious-Quarantined-Recovered) model with time delays for latency and an asymptomatic phase was investigated [36]. It was reported that time-varying social distancing, using the SEIQR model, could reduce the number of infections by about 50%. Recently, a novel model known as Bats-Hosts-Reservoir-People transmission network model was used to simulate the potential transmission from bats (infection source) to human [37]. Another method was developed where the age-specific Susceptible-Exposed-Symptomatic-Asymptomatic-Recovered-Seafood Market (SEIARW) model based on two suspected transmission routes was used to quantify age-specific transmission [38]. The two routes were from market to person and from person to person. The authors concluded that COVID-19 transmissibility is higher in elderly persons as compared to young persons. In [39] the influence of interventions and self-protection measures (travel restriction, quarantine of entry, contact tracing, isolation and wearing masks) on COVID-19 transmission dynamic in mainland China excluding Hubei province was modeled using the Markov Chain Monte Carlo (MCMC). The results showed that the containment strategies were effective and magnificently suppressed the pandemic transmission. It was also found that softening personal protection too early might lead to the spread of disease. The SPSS modeler was also used to investigate the correlation between average daily temperatures and the growth rate of COVID-19 in infected countries [40]. It was shown that the pandemic rates were higher in case studies where the average temperature is lower. Finally, in [41] a coupled ordinary differential equation metapopulation model for different courses on the disease in different age groups were developed. It was shown that the economic lockdown could be safely reversed at any time without a substantial effect on the course of the disease. In addition, it was concluded that strict quarantines could not be necessary to keep the number of infected people low.

4 Artificial intelligence and COVID-19

Artificial intelligence (AI) has been used mostly for medical image segmentation and diagnosis to classify whether a patient has COVID-19 or what is the severity of the infection. The images used in these works were mostly from medical X-ray radiology or Computed Tomography (CT). Before presenting the AI methodologies used in COVID-19 detection and classification, a brief description of these medical imaging modalities is presented.

4.1 COVID-19 detection based on CT scan

X-ray radiology consists of beaming x-ray photons onto a part of body to be imaged and collecting the photons that pass through that part of the body. Depending on the body’s tissue type, it will attenuate (block) some of the incident photons. This will create a shadowy image of the body on a detector located behind the body. X-ray radiology is used to examine bone structure and detect infections in the lungs. Computed tomography (CT) takes the ides of X-ray radiography further by taking X-rays images of the body from multiple angles to produce cross-sectional images without dissecting the body. These cross-sectional images also called slices are tomographic images and these contain more detail medical information than the conventional x-rays radiography. CT images are used to detect abnormalities in the body like tumors and hemorrhage it can also be used to detect pulmonary embolisms, excess fluid, and pneumonia in the lungs [42, 43]. This makes it suitable for diagnosis of COVID-19 which is a disease that attacks the lungs and the respiratory system.

In their study, Pan Feng et al. seek to verify the change obtained in the chest images of patients with COVID-19 pneumonia. The study was carried out on 4-day intervals from the first day of diagnosis to the day of total recovery. Excluded from this study are patients with complicated pneumonia with severe respiratory distress. For non-severe cases, the results of the chest scanner show a progress of lesions severity during the first 10 days, then stabilizes thereafter. According to this study, almost all the patients presented a spike of the disease around the 10th day, and the signs of improvement around the 14th day of the symptoms [44]. In a series of experiments carried out in 3 days on 51 patients, Yicheng Fang et al. studied the performance of 2 methods of medical examinations on patients with Covid-19. The results indicate that the sensitivity of chest CT to Covid-19 is higher than the RT-PCR technique (98% for CT versus 71% for PCR). When RT-PCR tests are negative, chest CT can therefore be used on patients with clinical and epidemiological characteristics of COVID-19, to confirm or refute the previous results [45]. Li Yan et al. also conducted the study to determine the rate of false diagnoses and the performance of CT scans on COVID-19. Their study was carried out on the first 51 patients confirmed by nucleic acid tests. The study confirmed the high performance of the chest CT which produced a low rate of false diagnosis on COVID-19 [46].

4.2 Image based (X-ray, CT) AI CoVID-19 detection and classification

The classification consists of separating images into groups. The three standards well known procedures to do that are supervised learning, unsupervised learning, and semi-supervised learning.

Supervised learning is an automatic task allowing a function to match input-output pairs [47]. The purpose of a supervised learning algorithm is to produce a function which maps the input-output (vector-supervision signal) pair. The algorithm will allow in an optimal scenario to correctly label the data to determine the classes. In the parallel world of human psychology, it is called conceptual learning [48]. Among the supervised learning algorithm used of the detection of COVID-19 are Convolutional Neural Network (CNN), Support Vector Machines (SVM), Logistic Regression (LR), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Decision Trees (DT) and Random Forest (RF). Table 4 shows the summary of the classifications methods used in COVID-19 studies.

Table 4 A collection of the open source dataset sources and their links

4.2.1 Convolutional neural network (CNN)

The principle of Neural Network (NN) is based on the collection of nodes (called artificial neurons), which freely model neurons in the brain. Based on examples, without any prior knowledge, without being programmed, this system automatically generates identification characteristics. When the algorithm uses multiple layers of neurons it is known as Deep learning. A Convolutional Neural Network (CNN) is a Deep Learning algorithm which takes an image as input, assign learnable weights to various features (objects) in the image so as to be able to differentiate one image from the other [9] [10].

Wang et al. in [49] used CNN with a dataset comprising of 13,800 chest x-ray radiography images from 13,725 patients so as to try and provide clinicians with a deeper insight into the critical factors affecting with COVID-19 cases. The reported an accuracy, sensitivity and positive prediction value (PPV) of 92.6%, 87.1% and 96.4% respectively. In [50], three models (ResNet50, InceptionV3 and InceptionResNetV2) based on CNN were proposed for detecting COVID-19 in pneumonia infected patients from chest X-ray radiography images. They used ROC analyses and confusion matrices to evaluate the performances of the three models and found that the ResNet50 model provided the best classification performance with an accuracy of 98%. In a retrospective and multi-center study carried out by Li et al. [51], CNN was employed for the detection of COVID-19. They extracted visual features from volumetric chest CT images of COVID patients and classified them. They reported that the method was not only able to detect COVID-19 case but also to distinguish it from other community acquired pneumonia and non-pneumonic lung diseases. In [52] a concept known as Transfer learning (where available data from one scenario is used to enhance accuracy of detection in a second scenario where there is lack of data) was used on X-ray images from patients with ordinary bacterial pneumonia, confirmed COVID-19 cases, and other normal infections. The goal of the work was to evaluate the performance of some state-of-the-art CNN architectures for medical image classification. They obtained an accuracy, sensitivity, and specificity of 96.78%, 98.66%, and 96.46% respectively and concluded that CNN with X-ray imaging might extract significant biomarkers related to COVID-19. Hemdan et al. [53] on their part implemented seven different CNN architectures with the aim of assisting radiologists in the automatic diagnoses of COVID-19 in X-ray images. They validated the architectures on 50 Chest X-ray images with half confirmed COVID-19 cases. They reported that the VGG19 and Dense Convolutional Network (DenseNet) models had the best performance both with an accuracy of of 90%.

4.2.2 Support vector machines (SVM)

Support Vector Machines (SVM) are supervised learning methods used for regression, classification and also outlier detection. The aim of SVM is to find a hyperplane in an N-dimensional space (where N is the number of features) that markedly classifies the input data. In other words, SVM will work to find a plane that has the maximum distance between data points of separate classes. Support vectors are those data points that are closest to the hyperplane. These data points affects the position and orientation of the hyperplane [67].

Barstugan et al. [54] presented an early detection of COVID-19 based on SVM. The algorithm was applied on abdominal Computed Tomography (CT) images. Four different image datasets of variable size (16x16, 32x32, 48x48, 64x64) were created from 150 CT images. Features were extracted through Grey Level Co-occurrence Matrix (GLCM), Local Directional Pattern (LDP), Grey Level Run Length Matrix (GLRLM), Grey-Level Size Zone Matrix (GLSZM), and Discrete Wavelet Transform (DWT) algorithms. SVM was then used to classify the extracted features. A maximum Sensitivity and accuracy of 97.56% and 98.71% respectively were obtained with 10-fold cross-validation and GLSZM feature extraction method. In [55] a combination of deep feature extractor and SVM was used to detect COVID-19 infection in X-ray images. The proposed model (combination of resnet50 and SVM) obtained an accuracy of 95.38%. In [57] SVM was used on features extracted from chest X-ray radiography images for early detection of COVID-19 cases. The features were extracted through a multi-level thresholding of the images. They obtained a classification accuracy of 98.82% on a total of 40 contrast-enhanced chest X-ray images.

Non-image data was also used with SVM and data from emergency care admission exams to detect COVID-19 cases. De Moraes et al. [56] used SVM and data from emergency care admission exams to detect COVID-19 cases. They collected data from 235 patients of which 43% were confirmed COVID-19 cases. They trained five machine learning algorithms namely logistic regression, random forests, gradient boosting trees, neural networks, and support vector machines on 70% of the patients, and evaluated their performance on the remaining 30%. They found out that the SVM had the best performance with an accuracy of 85% and concluded that the method could be used to target which patient needs a laboratory COVID-19 tests done on them.

4.2.3 Logistic regression (LR)

In statistics, logistic regression is used to model the probability, each sample is assigned a probability between 0 and 1. It can be extended to model several classes of events in order to determine for example different objects in an image [68]. Although simpler than the CNN, logistic regression also could be applied in the in depth study of the manifestation of COVID-19. For instance, in [58] logistic regression was applied to values provided by ROC analysis in the aim of investigating clinical and CT features that indicates severity COVID-19. Through logistic regression analyses it was found that the clinical factors associated with severe/critical COVID-19 pneumonia were patient older than 50 years, chest pain, dyspnea, comorbidities and cough among others. In [59] deep features from COVID-19 patient chest X-ray images were extracted using ResNet152 and then SMOTE was used to balance the data points of COVID-19 and Normal patients. Then finally, machine learning algorithms like Random Forest and XGBoost were used to classify according to the features. They obtained an accuracy of 97.3% for Random Forest and 97.7% for XGBoost.

4.2.4 Naive Bayes (NB)

Naive Bayes classifiers are among the simplest Bayesian network models from the family of probabilistic classifiers. Coupled with the Kernel density estimation, they can reach high levels of precision in digital images classification [69]. In the study of COVID-19, it has also been used for classification. In [60], the authors combined conventional statistical and machine learning in order to extract features from CT images. The extracted features were then classified by hybrid classifier system based on Naive Bayes. Experimental evaluation of this method produced and accuracy of 96.07%.

4.2.5 Linear discriminant analysis (LDA)

Linear discriminant analysis (LDA) is used to find a linear combination of features that characterizes or separates classes of objects or events in pattern recognition and machine learning. This resulting combination can be used as a linear classifier for dimensionality reduction before the final classification [70].

LDA was used in [61] with the aim of investigating the characteristics and rules of hematology changes in patients suffering from COVID-19. Clinical and laboratory test results of the patients were analyzed and different hematological parameters were fitted using LDA. The NLR&RDWSD combined parameter was found to be the best indicator of the severity of COVID-19 in patients with an accuracy of 93.8%.

4.2.6 Decision trees (DT) and random forest (RF)

Decision trees is a technique that helps analyzing decisions by identifying the most likely strategy leading to the goal. Random Forest on its part is essentially a collection of Decision Trees whose results are accumulated into a final result. They have the ability to limit variance without increasing error due to bias. In medical practice, it is used to classify patient images [71]. In [62], the chest CT images of 176 patients with COVID-19 were used for severity assessment. A random forest modeled and trained to evaluate the severity of COVID-19 in patients based on quantitative features. The RF model showed encouraging results with an accuracy of 87,5%. Shi et al. proposed an infection Size Aware Random Forest method (iSARF), their method had two steps, the first one consisted of categorizing different groups while the second classified the images [63]. They used an infection size feature defined as the ratio of the volume of infected regions to the total volume of whole segmented lung. This infection size was then used in a 3 level Random Forest classifier that classified it into 4 groups. They used a 5-fold cross-validation to evaluate the performance of the proposed algorithm and also compared it to other classifiers like logistic regression, support vector machine and neural network (NN). They obtained a sensitivity, specificity and accuracy of 90.7%,83.3%,87.9% respectively.

4.2.7 U-net

U-Net was first proposed by Ronneberger et al. for segmentation of Biomedical images [72]. The U-Net architecture has two paths namely a contraction path or the encoder and an expanding path or the decoder. In the encoder, successive convolutional and max pool down-sampling layers are used to extract the context of an image while in the decoder the discriminative features learnt in the encoder are projected onto the pixel space (image) so as to obtain a semantically segmented image. The decoder is made up of a series of upsampling, concatenation and then convolution operation.

U-Net based algorithms were also used in the segmentation of medical images for the purpose of COVID-19 detection. Chen et al. proposed a new method called modified U-net structure to segment the regions of infected lungs with COVID-19. They used Aggregated Residual Network (ResNeXt) for learning and complex features from the original images. They also applied a soft attention mechanism that enhanced the model ’s ability to differentiate various symptoms of COVID-19 [64]. In [65], Attention U-Net was used with an adversarial critic model to improve its performance. They obtained an average dice score of 97.8% on 1047 chest X-ray images from three sources. In [66], two methods are proposed, namely the InfNet and the Semi-Inf-Net. The Inf-Net uses implicit Reverse Attention and explicit Edge Attention to ameliorate the detection of infected regions in CT lung images. The Semi-InfNet is a semi-supervised solution that helps to overcome the lack of high quality and labeled images. They carried out extensive experiments on COVID-19 datasets and showed that the proposed methods perform better than other segmentation methods.

4.2.8 Unsupervised learning

Unsupervised learning have also been used in the study of COVID-19. Unsupervised learning, unlike supervised learning, searches for previously undetected prototypes in a data stream without pre-existing labels and minimimum human intervention. It makes it possible to model the densities of probability on the entries. This algorithm makes it possible to detect abnormal parts of data which do not correspond to any group, its application is in the field of density estimation in statistics [10]. Among the unsupervised learning used in COVID-19 is k-means clustering which is a vector quantization algorithm. It partitions n observations into k clusters in which each annotation belongs to the cluster with the neighboring mean, serving as the princeps of the cluster [73].

5 Datasets of COVID-19

In both mathematical modeling and AI, data is the raw material. So the first step in the development of COVID-19 applications is data collection. Over the course of few months there are multiple datasets that have been put online in regards to the COVID-19. Most if not all of these datasets are open source meaning that they are free for anyone to download and use. Also, they are constantly being updated with new data from the field. Table 4 presents a collection of the open source datasets explored. The following presents a comprehensive description of these datasets.

Dong et al. [74] currently provides one of the most complete database of the COVID-19 situation. The database known as the 2019 Novel Coronavirus Visual Dashboard operated is maintained by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). They obtained data from about 18 sources such as the WHO, CDC and other governments agencies, compiled and shared them in the form of an interactive map of the COVID-19 situation map. The database includes number of daily contamination, active, recovery and death. It also contains the location (state/province, country, longitude, latitude), number of people tested, incident rate and hospitalization rate. Xu et al. [75] are currently collecting and sharing health information on persons with COVID-19 from local to national level, together with other information from online reports. The data are localized geographically and also indicate aspects like the symptoms and the dates of confirmation and admission and also travel record.

Cohen et al. [76] created a COVID-19 Image Database by collecting X-ray images from various websites as well as publications. The database is made up of 345 X-ray images. Zhao et al. [77] created a computed tomography (CT) image database currently May 01, 2020) containing 349 images of confirmed COVID-19 cases along with 398 images of non-COVID-19 cases. The CT images are gathered from several COVID19-related papers. Ma et al. provided a dataset containing 20 labeled COVID-19 CT images of the left lung, the right lung and the infection type. The labeling was done by two radiologists [78]. The aim of their work was to establish a benchmark for CT image segmentation of lungs in regards to COVID-19. Two radiologists in based in Oslo, Norway have shared two CT datasets, the COVID-19 CT segmentation dataset (with 100 axial CT slices) and the Segmentation dataset nr. 2 (with 829 CT slices) from more than 60 patients [79]. The databases were manually segmented by radiology experts.

Chen et al. shared a COVID-19 twitter dataset [80]. This dataset contains an ongoing collection of tweets IDs associated with COVID-19 and which started from January 28, 2020. Such tweet IDs include “COVID-19”, “Coronavirus”, “Pandemic” and so on. They also tracked certain accounts like that of the WHO, CDCgov and HHSGov. As of May 01, 2020, they collected more than 10 million tweets in many languages. Rabindra [81] is also collecting tweets using the LSTM model deployed on a website. The model continuously monitors real-time Twitter feed for COVID-19 related tweets. It uses filters such as language “en”, and tweeter keywords like “covid19”, “coronavirus”, “covid” and so on. As of May 01, 2020, more than 30million tweets were collected.

Havard Dataverse also provides the Global News dataset which contains COVID-19 related Global English news from GDELT [82] and the climate dataset which contains time series temperature, humidity, air quality and other monitored data in China from January 1, 2020 [83]. The Coronacases Initiative which is a pro bono initiative of RAIOSS Desenvolvimento Ltda and Livon Saúde Ltda, also provides information on COVID-19 cases on their website [84].

In COVID-19 and other pandemic studies other datasets such as population density, mobility, Security incidents, economic situation, humanitarian condition data, and healthcare workforce are important data that will ensure the accuracy of the studies. Several sources provide those datasets. One of such sources is The WorldPop which shares spatial demographic datasets from Africa, Asia and central and South America [85]. Some of the datasets provided by WorldPop are population data, births, internal migration, age and sex data, administrative areas and global flight data. The Humanitarian Data Exchange (HDX) coordinated by the UN Office for the Coordination of Humanitarian Affairs (OCHA) shares more than 17,000 humanitarian datasets form 253 locations around the globe [86]. The WHO on its part shares the Global Health Workforce Statistics [87]. The dataset includes data on the number of health workers as well as hospital bed capacity in each country. The tech-giants Apple and Google both released mobility reports on COVID-19. Apple called their dataset Mobility Trends Reports [88] while Google called it Google COVID-19 Community Mobility Reports [89]. Both presents aggregated data that registers the daily use of various modes of transportation (walking, driving, transit) since the start of February 2020 as well as places visited or stayed in by users of their services. The data was collected from customer requests for directions or location in Apple and in Google Maps. They also offer a useful visualization tool of the data. Our World in Data on its part provides COVID-19 Testing dataset where they collect data that are based on tests carried out to establish if a person is currently infected [90]. ACAPS [91] provides a dataset of Government Measures Dataset also provides Government Measures implemented by Governments all around the world in response to COVID-19 while The Armed Conflict Location & Event Data Project (ACLED) [92] provides security incidents related to COVID19 dataset. The International Monetary Fund (IMF) [93] and BFA Global [94] both provide datasets on the key economic responses of governments and the effect of COVID19 management measures on economy.

Lastly, the software providerC3.ai compiled, cleaned, structured and standardized COVID-19 data from most of the sources presented in this paper [95]. The initiative known as C3.ai COVID-19 Data Lake contains analysis-ready COVID-19 data in one place. The service is free and the datasets are updated continuously. It contains everything from time-series data to case reports. Also, a github repository was created to collect COVID-19 images regarding AI research papers and datasets. It contain 19 datsets, 11 review papers, 18 clinical papers on Covid19 images, 54 AI-related papers, 54 atrticles on CXR methods, and 1 paper on Line Artefact Quantification in Lung Ultrasound Images [96].

6 Discussion and conclusion

The use of mathematical modeling and AI with COVID-19 data will increase our knowledge on the disease propagation evaluating prevention measures as well as early and accurate detection of the disease in patients. However, to arrive at this end a lot of data is needed to explore various models and AI algorithms. The data available up till now are mostly of medical images (for diagnosis) and text based data (for social impact analysis). While the later may be generated by and readily available to a large number people, the former on the other hand can only be generated in a specialized institution by a specialized professional. This means that data in low resource setting are not available as these places do not have the sophisticated imaging equipment needed to generate such images [97]. Also it is well known in data science that datasets from different geographical locations may not hold the same information and this is especially true in terms of healthcare data. More data types are therefore needed that can be easily generated easily anywhere on the Globe so as to enhance and render the application of the mathematical models and AI algorithm possible for many. These data types could be physiological measurements such as ECG, SPO2, body temperature that could be obtained using wearable devices [98]. Data concerning the type of preventive measures implemented by authorities are also not well documented. In this work only a few of the dataset found provided that information. However, this information could help in the examination and optimization of the set measures thereby improving the situation.

In mathematical modeling, most of the articles found in the writing of this paper are of COVID-19 dynamics. However, modeling can be done with appropriate datasets to explore the effect of the variables like climate and preventive measure on the spread of COVID-19 as explained earlier. There is also not many studies on the correlation of environmental and climatic conditions to the COVID-19 propagation in the work only two articles were found that addresses this issue and they both provide in different and interesting way of looking at the propagation of this diseases [27, 28]. Simulation of second and third waves of COVID-19 outbreaks will also help to enhance surveillance. As countries start easing social restriction measures, a study is needed to estimate possible hopspots for new outbreaks.

AI (deep learning) is powerful tool for early and accurate diagnosis of COVID-19 and many articles have addressed it. Most of them apply convolutional neural networks (CNN) in their work for medical image classification. Few other studies apply the Random forest and Support Vector Machines. There are also some that applied U-Net and its variations for the segmentation of CT and X-Ray images. The authors of the AI algorithms reviewed here all claimed that their algorithm performs very well on test data. However, it is well known that good performance of an algorithm on test data does not mean that it will perform similarly when deployed on the field. This is due to fact that in real life the data is more prone to noise and other artefact that are not usually present in the training and test data. The lack of diverse annotated images is also not helping the situation. In this review only 2 out of 18 studies were found to used annotated data from radiologists. Collaboration is needed between clinicians and AI experts in other to build a huge amount of annotated images of COVID-19. Also human in the loop or human augmentation can be another solution to overcome the problem caused by the disparity of an algorithm’s performance when applied to test data and when applied in the real world. Most of the studies reviewed used existing models while a few used well known models with some modifications. Those used with some modifications performed slightly better than the others stressing the need of developing hybrid models to build better and robust architectures. Much work is also needs to be done in terms of drug and/or vaccine discovery, treatment selection and contamination risk assessment for medical personnel [99]. Finally, since most of the AI research objective on COVID-19 is to find the optimal solution for diagnosis, other algorithms like Genetic Programming and Boosting (AdaBoost) should be explored so as to clear any doubt regarding their performances.

In conclusion, COVID-19 has spread rapidly all over the world creating an emergency situation. Mathematical modeling and AI have both shown to be reliable tools in the fight against this pandemic. Most of the modeling done were based on the Susceptible-Exposed-Infected-Removed (SEIR) model and the Susceptible-infected-recovered (SIR) model while most of the AI implementations were Convolutional Neural Network (CNN) on X-ray and CT images. Several datasets concerning the COVID-19 have been collected and shared open source. However, much work is needed to be done in terms of providing the public with a wide variety of data types and from many regions as possible. Also, other AI and modeling applications in healthcare should be explored in regards to this COVID-19.