Abstract

Nowadays, the whole world is facing a pandemic situation in the form of coronavirus diseases (COVID-19). In connection with the spread of COVID-19 confirmed cases and deaths, various researchers have analysed the impact of temperature and humidity on the spread of coronavirus. In this paper, a deep transfer learning-based exhaustive analysis is performed by evaluating the influence of different weather factors, including temperature, sunlight hours, and humidity. To perform all the experiments, two data sets are used: one is taken from Kaggle consists of official COVID-19 case reports and another data set is related to weather. Moreover, COVID-19 data are also tested and validated using deep transfer learning models. From the experimental results, it is shown that the temperature, the wind speed, and the sunlight hours make a significant impact on COVID-19 cases and deaths. However, it is shown that the humidity does not affect coronavirus cases significantly. It is concluded that the convolutional neural network performs better than the competitive model.

1. Introduction

COVID-19 (corona virus disease) is a transmittable virus produced by SARS-CoV-2, which was originated in a city named Wuhan located in Provinces of Hubei in China [1, 2]. Thereafter, within a span of 3 months, the entire world population got affected by this deadly virus. Basically, coronavirus is a protein molecule, which is protected by a fat layer. Therefore, it is advised to wash hands with soap or sanitizer containing alcohol, which destroys the fat layer, and the protein molecule diffuses. Generally, it affects human’s respiratory system and individuals feel problems in breathing. There are other symptoms also like high fever, cough, and fatigue. If an individual has strong immunity, then she/he can recover from infection very fast. On the other hand, people with weak immunity or already having underlying conditions such as diabetes or heart disease may suffer a lot from it. It can be deadly for them. This virus is transmitted via sneezing, coughing, and when people meet in close proximity.

It is more fatal as compared to other viruses as it can live for up to a few hours on several surfaces like iron, wool, plastic and steel, and copper for up to a few days [2]. There are various symptoms of this virus like high fever, cough, problems with breathing, and many more. A person who got infected by this virus can develop these symptoms within 2 weeks from the day of infection. Due to a very high rate of transmission from human to human, WHO declared COVID-19 infection as pandemic [3]. Till now, 35 million of humans got infected and greater than 1 million are dead, and this figure is still rising. Though 26 million people are recovered from this disease and vaccination is also in the process. But still, many new COVID-19 cases and deaths are being reported worldwide. The most affected countries are United States, India, UK, Brazil, and Russia. The statistics of distribution of cases, total cases, active cases, and total deaths have been shown in Figure 1.

Many researchers have put their efforts to understand the features and nature of coronavirus. Few researchers also forecasted the spread of coronavirus using various machine learning techniques. But most of the reported work was related to COVID-19 symptoms identifications, so there is a scope to analyse the impact of weather variables, such as temperature and humidity, on its spreading. A data set containing details of infections by COVID-19 and parameters for weather has been taken for this task. In the work presented, we try to calculate the influence of different weather parameters on COVID-19 confirmed cases and fatalities. As compared to the existing works, the present work uses a greater number of features that can impact the worldwide spread of the COVID-19 virus.

Melin et al. [4] analysed the evolution of COVID-19 pandemic spatially across the countries using some unsupervised machine learning techniques such as self-organizing maps and others. The authors come with a conclusion that the self-organizing maps with their clustering abilities enabling the categorization of countries on the basis of confirmed cases of COVID-19. The key points of this work consist of the following:(i)Analyse the impact of various weather factors on COVID-19 cases and deaths.(ii)Analyse and predict the growth rates of COVID-19 across countries.(iii)Predict how this epidemic will end.(iv)Predict and validate machine learning techniques on weather condition features.

The rest of the paper is structured as follows: Section 2 describes the existing work to the subject. The methodology is discussed in Section 3. Section 4 demonstrates all experiments and analyses achieved. At last, conclusion and future work are shown in Section 5.

In the last few months, various research papers are published on coronavirus. A literature survey is summarized and presented in this section. Contreras et al. [5] presented a study on COVID-19 among the heterogeneous population using SEIRA model. Many works have been reported in the literature to predict COVID-19 spreading. Dangi and George [6] used a novel weather forecasting method for predicting the outbreak of COVID-19 India. They correlated the temperature factor with coronavirus cases of five most affected cities worldwide. Sajadi et al. [7] presented a method to identify a high-risk zone of COVID-19 spread based on modelling of weather. Demongeot et al. [8] demonstrated the impact of temperature on spreading of coronavirus diseases. Marvi and Arfeen [9] identified the relationship between rate of COVID-19 spread across different regions worldwide and the average temperature. They considered two factors, namely the spread rate of the virus and temperatures. Xu et al. [10] presented a statistical study on complete data sets of the spread of coronavirus globally. This study included more than 3700 locations worldwide for analysis. Several features were used by the authors such as density of population, delay in detection, and responses that are time variant. Some weather-related variables were also used to predict the coronavirus spread and the global projection throughout the year. Few researchers advocate that temperature and humidity affect the transmission of COVID-19.

Few researchers used machine learning and artificial intelligence (AI) based techniques for the detection or analysis of COVID-19. Alqudah et al. [11] used support vector machine (SVM) and random forest (RF) for convolutional neural network (CNN), then after they used machine learning techniques to classify the observations into COVID-19 and non-COVID-19 cases. The authors got the accuracies 90.5% and 81% for SVM and RF, respectively. Bayesian CNN (BCNN) was applied by Ghoshal and Tucker [12] for finding the COVID-19 using a chest X-ray. They also identified the correlation between uncertainty and prediction accuracy and got a 90%-accuracy. Salman et al. [13] detected the COVID-19 infection through X-ray images of chest using CNN and attained a 100% accuracy. Bullock et al. [14] applied machine learning methods for detecting COVID-19. Pham et al. [15] reviewed various research papers on the applications of big data and AI for current pandemic situation.

Sujatha and Chatterjee [16] presented a model for predicting the spread of COVID-19 using multilayer perceptron, linear regression model, and Vector autoregression model. Barstugan et al. [17] detected early-stage location of COVID-19 using models built on the basis of machine learning techniques by using tomography pictures of stomach. Elmousalami and Hassanien [18] used time series models to identify COVID-19 influenced cases. Jahangoshai Rezaee et al. [19] presented an approach based on fuzzy inference system, linguistic FMEA, and a model of analysis of envelopment of fuzzy-data to determine an innovative rank for casing some insufficiencies of RPN and the ordering of dangers of HSE.

Li et al. [20] presented a new method to analyse the transmission of COVID-19. The authors applied the Gaussian distribution theory for prediction and obtained better accuracy and low error. They performed their analysis on COVID-19 data of China, Italy, Iran, and South Korea. Tomar and Gupta [21] predicted the spread of coronavirus cases using long short-term memory equipped recurrent neural networks. They also compared the novel COVID-19 with other viruses such as Ebola and SARS. Mahato et al. [22] analysed the effect of lockdown on air pollution in India. They found that air quality is improved after a long lockdown. Sina et al. [23] prepared a prediction model for coronavirus using machine learning techniques and soft computing. Li et al. [20] used few filter algorithms to refine COVID-19 data, i.e., Weibo data set, and to extract relevant information. Kang et al. [24] used machine learning techniques to detect coronavirus cases. The authors used CT scan (chest computed tomography) images and got satisfactory results. Oh et al. [25] analysed chest X-rays images using CNN and obtained very effective results in terms of accuracy. Sun and Zhai [26] predicted two indices: adequate social distance and space ventilation using Wells-Riley model. They indicated that 1.6–3.0 m is safe distance, which can reduce infection rate by 20–40%. Sannigrahi et al. [27] presented a study on the global and local spatial association between the key sociodemographic variables and COVID-19 cases. They also analysed the deaths in the European regions using the spatial regression models.

Rumpler et al. [28] studied the impact of partial lockdown strategy on the spread of COVID-19. They also observed the variation of the city noise levels during the associated period. Li et al. [29] explored the variations in the size of the COVID-19 confirmed case clusters across the central district Huangzhou in the prefecture of Huanggang adjoining with Wuhan. Basu et al. [30] examined the changes in noise pollution due to COVID-19 lockdown in Dublin city. They found that noise pollution was reduced by 60% after lockdown. Zhou and Chen [31] highlighted an investigation on emerging patent landscape using gene therapies. They also summarized the various ideas to control COVID-19 pandemic. Gul and Yucesan [32] proposed an integrated approach using decision-making concepts and interval-valued spherical fuzzy sets for assessing preparedness of hospitals. Sharma et al. [33] showed the need of artificial intelligence to predict the financial market in COVID-19 time. Stifanic et al. [34] demonstrated the impact of coronavirus on stock prices. They proposed a wavelet transform-based approach to forecast the stock prices and commodity values. Elhia et al. [35] proposed a nonlinear model to control coronavirus spreading in Morocco city. Oluwasanmi et al. [36] demonstrated the use of deep learning and adversarial network to detect COVID-19 pneumonia in computed tomography scans of the lungs. They obtained very satisfactory results.

From the review, it has been found that there are still some areas where we can evaluate the impact of COVID-19 on the weather conditions. Also, deep learning models [37, 38], reinforcement learning [39, 40], computer vision approaches [41, 42], etc. can be utilized for predicting the impact of COVID-19 on weather conditions [43, 44]. Additionally, we can utilize deep learning-based approaches for the diagnosis of COVID-19 [45]. Inspired from these approaches, in this paper, we have analysed the impact of various weather factors on COVID-19 cases and deaths. Additionally, the growth rates prediction of COVID-19 across countries is also considered.

3. Proposed Methodology

This section discusses the methodology used in the present work, statistics of data set, and overview of used models. As mentioned in previous section, various environmental evidences such as food, water, and climate affect the spread of communicable diseases. The base theory behind this is that previous studies about other pandemics demonstrated a season-based pattern between number of positive cases and spread of viruses with climate and then COVID-19 might also inhibit the same pattern. Furthermore, humidity and temperature make meaningful impact on the spread virus as these vary throughout seasons. The present work is done in threefold. First, the impact of various weather factors such as temperature and humidity on the spread of COVID-19 cases and deaths is analysed. Second, machine learning techniques like Artificial Neural Network (ANN) and CNN are implemented for predicting COVID-19 cases and deaths. Third, how this epidemic will end is predicted.

Figure 2 shows the steps that we have followed in this paper. First, the weather and COVID-19 data sets are collected from different sources, and those need to be preprocessed to remove noise and anomalies. So, we used preprocessing techniques to refine it. Then after, we performed an analysis on weather data to determine the impact of various factors such as temperature, humidity, sunlight hours, and wind speed on coronavirus cases and deaths. It is not mandatory that all weather factors influence COVID-19 cases and deaths. Then, the factors that do not make any impact on the spread of coronavirus will be analysed using machine learning techniques. On the other hand, we divide the data set into two parts: the test data and the training data, as shown in Figure 2. Further, a trained model is prepared using a training data set, which is validated and evaluated after that.

In this study, the data set is taken from Kaggle, which is compiled by Johns Hopkins Centre for System Science [46]. This data set contains official COVID-19 case reports of various countries. This data set consists of various features regarding countries such as country name, latitude, longitude, date, positive cases, total number of deaths, cases recovered, and active cases. Moreover, data related to weather are also taken from historical weather database [47]. This data set contains country, city, temperature, humidity, sunlight hours, and wind speed. Table 1 tabulates the descriptive statistics of the used weather data set. This table shows the total number of observations (N), the mean, the maximum value (Max), the minimum value (Min), and the standard deviation (Std. Dev.) of various data set features. A total 16557 observations have been taken for analysis.

We have used regression techniques to analyse the correlation of weather parameters like temperature, humidity, wind speed, and sunlight hours with the spread of coronavirus cases and deaths. ANN and CNN are also developed to analyse the COVID-19 data. The brief introduction of these machine learning techniques is given below.

3.1. Artificial Neural Network

In ANN, the input that is received from the connected nodes is gathered, and the weights are used along with a simple function for calculating the output values by the individual nodes. There are different shapes and architectures of neural networks. Depending on the problem’s complexity, the neural network architecture is developed by the specified user, which includes the number of hidden layers and the total number of nodes present within the hidden layer as well as their connectivity. The supervised and unsupervised techniques are used to configure the ANNs. The relationship between the node output and the set of inputs is defined by the activation functions. We used the particular type of function for activation purpose that is called “sigmoid” function having an “S” shaped characteristic curve or a sigmoid curve. It creates a set of probability outputs ranging from 0 to 1 and that resembles to a set of inputs and is mathematically defined as follows:

3.2. Convolutional Neural Network

CNN is a sort of deep learning technique, which assigns biases and weights to various observations and distinguishes one from the other. It needs a smaller amount of preprocessing compared to other classification algorithms. CNN uses applicable filters to record the spatial and temporal dependencies of the observations. There are various CNN designs such as LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, and ZFNet. CNN layers operate kernels to extract the specific pattern that proceeds toward max-pooling layer.

4. Experimental Results and Analysis

To perform all the experiments, first the entire data are split into training data (75% of the total data) and testing data (25% of the total data). Then, all the machine learning techniques are trained on training data and further all trained techniques are tested on test data. We have used data resampling method to split the data in this work. More precisely, we have used a K-fold cross-validation technique and with the defined value of K initially as 10. In cross-validation using 10-fold, we partition the data sets into a total of 10 number of sets then at each and every time, one set is selected as the set for testing and the ones that are remaining as the training sets. Then, a model is fitted for the current set for training and then we assess it on the selected test set. Hyperparameters tunings can also be implemented by using this technique. The hyperparameter search that is among the best is selected using sckit learn’s grid search function along with cross-validation.

First, we have analysed the impact of various weather factors on the spread of COVID-19. Table 2 presents the summary of regression statistics of dependent variable: confirmed COVID-19 cases. This statistic includes multiple R, multiple R2, and adjusted R2 values. Multiple R is the correlation coefficient, which shows the degree of relationship, i.e., how much strong relationship of dependent variable. As per Table 2, the value of multiple R is 0.3112. Likewise, multiple R2 is a determination coefficient, which represents the number of points falling on the fitted line of regression. Table 2 shows that approximately 17% y-values variation about the mean are described through the x-values. As there are more than one independent variable in the data set; therefore, adjusted R2 is computed, which is approximately 16%.

In the present work, the impact of individual independent variables like latitude, longitude, humidity, sunlight hours, temperature, and wind speed is also analysed on dependent variable, i.e., confirmed COVID-19 cases. Table 3 shows the dependency of confirmed COVID-19 cases on all individual predictors. This dependency is computed in terms of standardized coefficients of regression (b), normal error of b, coefficients of raw regression (b), normal error of b, t-test distribution (t), and value. The calculated values of these factors are utilized to liken the comparative influence of each forecaster in the estimate of dependent variable (confirmed COVID-19 cases). The independent variables/predictors latitude, longitude, humidity, hours of sunlight, temperature, and wind speed make significant impact on the spread of coronavirus as the value is less than 0.05 (confidence interval of 95%) for these predictors. The raw or unstandardized coefficient of regression (b) for hours of sunlight is 17.803, which specifies that if all supplementary forecasters are meticulous (constant), then the increase of single unit in hours of sunlight decreases the coronavirus-infected cases by 17.803. Similarly, the value of b for humidity is 14.723, which specifies that if all additional autonomous variables are constant, then a unit increment in humidity increases the coronavirus-infected cases by 14.723. Likewise, the value of regression coefficient (b) for temperature is −8.938, which means that if we make all other predictors as constant, then one unit of increment in temperature will decrease the coronavirus cases by 9.938. The similar type of analysis can be complete for further forecasters.

We have also performed the correlation between mean temperature, humidity, and wind speed with the active cases of COVID-19 and total deaths due to COVID-19. The correlation among these features is represented by the heatmap in Figure 3. The colors of the heatmap tiles indicate the degree of correlation between the labels on the x-axis and the y-axis. A lighter shade represents a high correlation, whereas a darker shade represents less or no correlation. The map reveals that there is a very little amount of correlation seen worldwide of the features compared to the COVID-19 cases. As the geographical conditions of each country are different and hence a worldwide correlation with these parameters is difficult to obtain.

Though the above table and figure show the impact of individual weather factors on COVID-19 cases. We have used the visual diagram to understand it in detail. Figure 4 shows the association among the temperature and the coronavirus blowout. In this figure, the size of circle shows the number of coronavirus confirmed cases, and colour represents temperature. This figure clearly indicates that coronavirus confirmed cases are worldwide. Though corona can happen in high temperature too, but confirmed cases may become fewer.

Figure 5 presents the relationship between temperature and COVID-19 deaths. It is clear from this figure that COVID-19 deaths are not too much related to temperature. COVID-19 deaths are high at the beginning stage of spread in each country, but later it came down.

Figure 6 represents the coronavirus spread against humidity. It is clear from the figure that Europe has high humidity and has coronavirus spread too. Similarly, Figure 7 shows the relationship of humidity with COVID-19 deaths. It can be seen from this figure that the regions have low humidity and have also less COVID-19 deaths, but few exceptions are also there.

Figure 8 shows the graph drawn between raw residuals and confirmed cases with 95% confidence interval. Residuals is the subtraction of the predicted value from the observed value. The positive values for the residual (on the y-axis) mean the prediction is too low, negative values mean the prediction is too high, and 0 means the guess is correct. According to Figure 8, the predicted values are correct as all observations are lying over the regression line. Similarly, Figure 9 presents the graph drawn between raw residuals and COVID-19 deaths.

Table 4 presents the summary of implemented ANN in terms of structure of the implemented ANN, used training algorithm, error function, hidden activation function, and output activation function. Testing error, training error, and error in validation are also presented in this table for all developed ANNs. In this work, a total of 5 different ANNs are implemented, as shown in Table 4. We have used the SOS error function in all implemented ANNs. Identity type of hidden activation function is used for first three ANNs, whereas exponential type of hidden activation is used for last two ANNs. We have used logistic output activation function for first three ANNs and identity output activation function for last two ANNs. BFGS 15 training algorithms are used in the first ANN, whereas BFGS 17 is used in the second and third ANNs for training. BFGS 40 and BFGS 35 training algorithms are used in the fourth and fifth ANNs, respectively, as shown in Table 4.

Table 5 summarizes the results for the active CNN. We have implemented multilayered perceptron (MLP) 311-8-2 structured CNN. BFGS 4 training algorithm is used in the developed CNN along with the SOS error function. We have used Tanh function as hidden activation function and identity as output function. Table 5 depicts that we have obtained better results (less training error, training error, and validation error) using CNN in comparison to all five ANN models.

The quest for the vaccine of coronavirus is ongoing, as many major pharma companies and research firms are working on finding an effective vaccination. It is still unclear as of how and when this epidemic will end and how many of the patients will successfully recover from this disease. The main motive of this research is to identify how many of the patients will recover by analysing the records of past recovered patients. And to accurately do so, the record from various countries regarding patient’s recovery is taken into consideration. The work is oriented around understanding the thing that how this epidemic is going to end. Since till date no possible cure or vaccine is available, this work can add up to get insight into the details of number of patients that will recover based on the old records. And by doing so, we will be able to predict from when this current situation of coronavirus will end.

5. Conclusions and Future Works

In this work, we have performed a study on the impact of different weather factors on COVID-19 confirmed cases and deaths using regression. From the results, it is found that temperature, sunlight hours, and wind speed make significant impact on the spread of COVID-19 and deaths, but humidity does not affect coronavirus confirmed cases and deaths significantly. Moreover, we have developed five ANNs and one CNN to analyse COVID-19 data set. It is seen from the experimental results that CNN performs better in comparison to all five ANNs. Discovering the factors that affect the spread of the coronavirus is the main contribution of this paper. This study is very useful for preventing the spread of coronavirus. It can also be used to produce common responsiveness and understanding about the issues impacting the worldwide spread of this deadly coronavirus. At the last, a conceptual analysis was also given on how this epidemic will end.

In future work, other weather factors like rainfall can be considered to study the spread of COVID-19. The current study can also be extended using more analyses and by fine-tuning the prediction and visualization techniques. Moreover, other deep learning techniques can be considered and developed for analysing COVID-19 data. Further, the impact of vaccination can be analysed on the spread of coronavirus worldwide as future work.

Data Availability

The data used in the paper are available on the link: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Acknowledgments

The authors are thankful to School of Computing, University of Eastern Finland, Kuopio, Northern Europe, 70211, Finland, for financial assistance and support.