1 Introduction

In the last two decades, we have seen various epidemics conditions. These conditions were started from SARS in 2002 and followed by SWINE FLU in 2009, after that EBOLA in 2013, then MARS in 2014, and now COVID-19 in 2019 [1,2,3,4,5]. These epidemic encounters can cause severe human and economic losses. If we talk about the latest COVID-19 disease, which came into the picture in late December of 2019 from china. The first case of this virus came from Wuhan city of China, which was new and never seen before. Initially, it was known as the Wuhan virus, and after that, it was coded as COVID-19 or novel coronavirus or 2019-nCov [6]. According to the hypothesis, this virus had come from the fish market (probably from the bat) of Wuhan city. Initially, it started infecting the people of Wuhan city and spread in other cities in a short span [7]. In fifteen days, china was brutally affected, and within a month, it becomes a global epidemic. Initially, it was claimed that it is an air-born disease, but after the proper assessment, the scientist told that it is a touched contagious disease, and it lasts from hours to days in different environments or surfaces. As we know, it is touched contagious disease, so we can say that it can be spread from the human to human interaction or human to surface contact. Due to the rapid spread rate of COVID-19 worldwide, the warning regarding global health emergency had been issued by the World Health Organization (WHO) [8] in the last week of January 2020. In the first week of March 2020, the WHO had declared the COVID-19 as a pandemic disease [9].

The novel coronavirus, or COVID-19, has come from the same virus family from where the Severe Acute Respiratory Syndrome (SARS-CoV) and Middle East Respiratory Syndrome (MERS-CoV) came. It is a new disease for the human that came into the picture at the end of December 2019 [10]. Before 2019 it was not identified in humans, so we did not know much about it. The COVID-19 is a zoonotic disease, which means it was transmitted from animal to people. This hypothesis is correct because some known coronaviruses are circulated from animals but have not affected humans yet. From the studies, we found that the SARS-CoV and MERS-CoV were transmitted from the animals. The SARS-CoV came from the civet cats, and MERS-CoV came from camels [1,2,3,4,5]. These diseases were born from the mutation process, which means they had transformed their properties before going into the human body from the animals.

In Table 1, some virus-based epidemic diseases have been shown [1,2,3,4,5]. This table consists of, disease name, year of its occurrence, total cases, and total death caused by the virus, totally recovered people, and the number of countries where these viruses were caught.

Table 1 Virus based epidemic diseases till date

In the novel coronavirus or COVID-19, the patient initially feels usual flu-like symptoms i.e., shortness of breath, cough, fever, and breathing difficulties, which can become more severe in a couple of days. It targets the human respiratory system, or we can say that it is a respiratory system-based disease. In severe cases, it can cause acute respiratory syndrome, pneumonia, kidney failure, other severe respiratory-related diseases, and even death too. To protect ourselves from COVID-19 and to stop its spread, we have to take precautions such as cover nose and mouth when sneezing and coughing, regular and proper handwashing, and eating those eggs and meat that is adequately cooked. Use hand sanitizer and mask on public places, don’t spit on the road, avoid close contact with the ill person, avoid traveling with public transport and stay at home as much as possible [5, 8, 11].

In Fig. 1, the situation caused due to COVID-19 outbreak is represented with a bar graph. In the figure, the y-axis represents the number of people, and on the x-axis, the number of cases, the number of recovered cases, number of active cases, and deaths caused by COVID-19 have been presented [11].

Fig. 1
figure 1

COVID-19 Situation Worldwide

In this epidemic situation, any small finding can help a lot. Therefore, in this challenging pandemic situation, the forecasting of coronavirus outbreak plays a vital role and gives an idea about its widespread in upcoming days, which will help the government to take preventive measures for minimizing its spread [12, 13]. In this case, we need such a model that can be accurate, efficient, and widely applicable. It is a challenging task, and it becomes more challenging because we don’t have sufficient real-time data. Considering these limitations in our mind, we start testing with the basic forecasting models, which are not as accurate as we are looking for [14]. After several trials, we found that our proposed Enhanced Multi-Task Gaussian Process Regression (MTGP) model on the COVID-19 outbreak forecasting is better than the traditional forecasting model.

The primary aim of this study is:

  • To develop such a model that can be capable of dealing with this epidemic situation and give the best forecasting results.

  • To provide information on possible conditions in the upcoming days due to the outbreak of COVID-19, which will help in planning the preventive measures accordingly.

  • Along with that, the significance of IoT in COVID-19 detection and IoT based possible solutions for minimizing the impact of the COVID-19.

The rest structure of this research paper is as follows. The related work has been presented in Section 2. Section 3 deliberate about the data sets and proposed methodology along with the four traditional forecasting models. The experiment results are based on performance evaluation of the COVID-19 dataset and which has been presented in Section 4. In Section 5, the detailed discussion about the results, and the significance of IoT in the detection of COVID-19 with IoT based possible solutions are shown. In Section 6, concluding remarks with the future scope is drawn.

2 Related work

Complexity in developing epidemiological models encourages the researchers to find out the machine learning-based model to deal with the epidemic situation. The primary aim of the machine learning models is to develop models with better prediction reliability and better generalization ability [15,16,17,18,19]. Various machine learning models were used to predict multiple pandemics such as Ebola, SARS, Swine flu, Cholera, H1N1 influenza, Zika, Dengue fever, Oyster norovirus, and MARS [20,21,22,23,24,25,26,27,28,29,30]. These machine learning methods are limited to some basic models such as Neural Networks, Random Forest Regression, Bayesian Networks, Genetic Programming, Naïve Bayes, Classification and Regression Tree, Linear Regression, and Support Vector Regression. However, over a long time, machine learning models are being used to serve in natural disasters and weather predictions [31, 32].

For dealing with this COVID-19 epidemic, various researchers and research groups have given the mathematical and algorithmic approaches. The sub-epidemic model, the Richards model, and the logistic model had been used by Roosa et al. [33] for the prediction of confirmed cases in upcoming days (in the next 5 and 10 days) of Zhejiang province and Guangdong province. The mathematical model for the propagation status simulation due to COVID-19 in China was used by Liu et al. [34]. This mathematical model will further help the government in minimizing the impact of the pandemic. The transmission dynamics’ time-dependent compartmental model was adopted by Boldog et al. [35] for estimating the confirmed cases outside china’s Hubei province. The improved version of the salp swarm algorithm and the flower pollination algorithm based on ANFIS (Adaptive Neuro-fuzzy Inference System) had been proposed by Al-Qaness et al. [36], for the prediction of confirmed cases in the next ten days. For evaluating the performance of the proposed algorithm, GA (Genetic Algorithm), PSO (Particle Swarm Optimization), ABC (Artificial Bee Colony), and FPA (Flower Pollination Algorithm), had been taken. The statistical induction and delayed distribution estimation based mathematical model was proposed by Jung et al. [37] for predicting COVID-19 cases. An analytical method based mathematical model was proposed by Fan et al. [38] for predicting the floating population of Wuhan city. In this paper, the authors had established the correlation between the floating population and daily confirmed cases.

The deep learning algorithms are being widely used in the genome-based prediction of COVID-19 propagation. The SEIR (Susceptible Exposed Infectious Removed) model with LSTM (Long Short-Term Memory) model had been proposed by Yang et al. [39] for the integration of the epidemic curve. Hu et al. [40] had suggested the improved version of the stacked autoencoder and clustering algorithm for the grouping of promptly confirmed cases in each province. In this paper, the authors had also found that the AI-based methods achieve higher accuracy for the prediction of COVID-19 trajectory. The deep learning algorithm for the gene sequence-based virus prediction had been used by Guo et al. [41]. In this paper, the authors had compared the gene sequence of SARS-CoV (Severe Acute Respiratory Syndrome), bat SARS coronavirus, and MERS-COV (Middle East Respiratory Syndrome) with the COVID-19 to find out the similarity among the viruses.

The early outbreak trajectories were predicted by Riou et al. [42] to determine the reproduction rate of COVID-19. Besides that, the authors had also told the possibility of frequent infections among people due to COVID-19. The improved version of the SIR (Susceptible Infectious Recovered) model for the prediction of infection cases due to COVID-2019, was proposed by Ming et al. [43]. The authors also predicted the actual load on the ICU (Intensive Care Units) under the various public health intervention efficacies and diagnosis rates. Zhao et al. [44] had proposed a probability model for the accurate prediction of the infection nodes by the snapshots of spreading. The machine learning-based pathogen-risk models were introduced by Fountain-Jones et al. [45] for predicting the COVID-19 outbreak over the various machine learning models. The ARIMA (Autoregressive Integrated Moving Average) model had been used by Benvenuto et al. [46] for the prediction of the trend and morbidity of the COVID-19 outbreak. In this paper, the authors also cited that, if the virus does not mutate, then the number of cases will reach a plateau.

3 Materials and methods

In this section, we are going to discuss the materials and methods which have been used for result finding. This section is divided into four subsections, i.e., dataset description, proposed methodology, model comparison, and statistical analysis. In the first subsection, the detailed discussion about the COVID-19 dataset has been presented. In the second subsection, the proposed MTGP model has been discussed in detail. In the third subsection, a brief discussion with the mathematical foundation of the four-prediction algorithm has been presented. In the last subsection, the performance evaluation metrics have been discussed.

3.1 Data

For this research work, we had gone through the health bulletin of the WHO. Based on the WHO’s situation reports, the dataset has been earned, and all the experimental evaluation task has been performed on that. Due to COVID-19, approximately 213 countries have been affected. The data we took for making the forecasting model was from 31/12/2019 to 25/06/2020 time stamp. This COVID-19 dataset is consisting of the nine columns which are: country/territory/area name, date, the total number of confirmed cases, the total number of deaths, the total number of new confirmed cases, total number new deaths, the total number of recovered patients, the total number of active patients, and transmission classification [11]. Based on these nine columns, the data of all the affected countries has been recorded.

The visualization of the dataset is shown in Fig. 2. The total number of confirmed cases in the period of 31/12/2019 to 25/06/2020 has been drawn on the map. The radius of the circle is depended on the exposer in the respective country. For the visualization on the map, we had taken the country’s geographical location, which is mapped with the number of confirmed cases.

Fig. 2
figure 2

COVID-19 Outbreak Worldwide

3.2 Proposed methodology

Gaussian Process Regression (GBR) is the non-parametric based regression model that is most commonly used in the predictive analysis [14, 47]. It can also be used for a complex system like an arbitrary system. Multi-Task Gaussian Process Regression (MTGP) is the extensive model of the basic GPR model. It is also known as the advance or special case of a typical GPR model [48]. It is applicable when we want multiple outputs using a standard GPR model. This MTGP model was proposed by Bonilla et al. in 2008 [49], and its advanced version was introduced by Dürichen et al. in time series analysis of multivariate psychological application [50]. More recently, this method was utilized by Richardson et al. to predict battery capacity with improved results [51]. In respect to the prediction of the COVID-19 outbreak using the proposed MTGP model, the COVID-19 time series data has been given at the input side, and in the output side, the historical and reference series has been received. In the MTGP model for the prediction of time series, we follow the same testing and training mechanism as we were using in the traditional GPR model, excluding the kernel matrix. For a better understanding of the kernel matrix in the MTGP model, the two cases of output tasks are considered as an example.

The standard GPR model is described in Eq. (1)

$$ a=f(b)\sim N\left(m(b),i\left(b,b^{\prime}\right)\right) $$
(1)

Where the output and input have been denoted by a and b, the mean function has been defined as m(b), f(b) denotes the latent variable, and i(b, b′) denotes the covariance function.

The squared exponential kernel (SEK) function is defined as,

$$ {i}_{SEK}={\theta}_1^2\exp \left(-\frac{d^2}{\theta_2^2}\right) $$
(2)

Where d denotes the Euclidian distance, and θ1, θ2 denotes the hyper-parameter, which needs to be optimized.

Thus, the hyper-parameter of the kernel matrix I have been estimated by minimizing the NLML (Negative Log Marginalized Likelihood) has been shown in Eq. (3)

$$ {\displaystyle \begin{array}{c} NLML=-\log \left(p\left(a|b,\theta \right)\right)\ \\ {} NLML=-\frac{1}{2}\mathit{\log}\left|I+{\sigma}_n^2J\right|-\frac{1}{2}{a}^T{\left(I+{\sigma}_n^2J\right)}^{-1}-\frac{n}{2}\mathit{\log}\left(2\pi \right)\end{array}} $$
(3)

The mathematical implementation of the MTGP regression model has been delivered in Eq. (4).

$$ \left(\genfrac{}{}{0pt}{}{a_r}{a_p}\right)=N\left[0,\left(\begin{array}{c}{\theta}_{rr}I\left({b}_r,{b}_r\right){\theta}_{pr}I\left({b}_p,{b}_r\right)\\ {}{\theta}_{rp}I\left({b}_r,{b}_p\right){\theta}_{pp}I\left({b}_p,{b}_p\right)\end{array}\right)\right] $$
(4)
$$ \left(\genfrac{}{}{0pt}{}{a_r}{a_p}\right)=N\ \left(0,{I}_{MTGP}\left(b,b\right)\right) $$
(5)

Where, ar, ap and br, bp = two input and two output having with the same dimensionality (In the case of 1D the br and bp are sever as time indicators for two time series).

θrr, θrp, θpr and θpp = correlation coefficients of output series

$$ {F}_p^{\ast }={I}_{MTGP}\left(b,{b}_p^{\ast}\right)\ {\left({I}_{MTGP}\left(b,b\right)+{\sigma}_n^2J\right)}^{-1}\left(\genfrac{}{}{0pt}{}{a_r}{a_p}\right)={X}_{MTGP}\left(\genfrac{}{}{0pt}{}{a_r}{a_p}\right) $$
(6)

Where,

$$ {\displaystyle \begin{array}{c}{I}_{MTGP}\left(b,b\right)={\left(\begin{array}{c}{\theta}_{rr}I\left({b}_r,{b}_r\right){\theta}_{pr}I\left({b}_p,{b}_r\right)\\ {}{\theta}_{rp}I\left({b}_r,{b}_p\right){\theta}_{pp}I\left({b}_p,{b}_p\right)\end{array}\right)}_{\left(r+p\right)\times \left(r+p\right)}\\ {}{I}_{MTGP}\left(b,{b}_p^{\ast}\right)={\left({\theta}_{pr}I\left({b}_p^{\ast },{b}_r\right)\ {\theta}_{pp}I\left({b}_p^{\ast },{b}_p\right)\right)}_{15\times \left(r+p\right)}\end{array}} $$

The algorithm for the proposed MTGP for COVID-19 outbreak forecasting has been deliberated below in Algorithm 1.

figure c

3.3 Model comparison

In this section, a brief discussion with the mathematical foundation of the prediction algorithm used in the experimental evaluation has been presented.

3.3.1 Linear regression (LR)

Linear Regression (LR) is one of the most commonly used regression models for predictive analysis. The LR model uses two-variables, where the first one is dependent or reliant, and the second one is explanatory or descriptive variable. It is being used for the establishment of the relationship between the two variables. It is denoted by the slope formulary [52].

The LR model has been represented in Eq. 7, where slope in the line has been denoted by by-intercept has been indicated by a, the reliant or dependent variable has been denoted by y, and the descriptive or explanatory variable has been denoted by x.

The LR (Linear Regression) model with the mathematical formulation is defined in the below equation:

$$ y=a+ bx $$
(7)

For calculation of a and b the following formulas have been used:

$$ b(slope)=\frac{n\sum xy-\left(\sum x\right)\left(\sum y\right)}{n\sum {x}^2-{\left(\sum x\right)}^2} $$
(8)
$$ a(intercept)=\frac{n\sum y-b\left(\sum x\right)}{n} $$
(9)

3.3.2 Support vector regression (SVR)

Support Vector Regression (SVR) is one of the most commonly used regression models for predictive analysis.

For the prediction using the SVR model, Eq. 10 is used [53].

$$ {\displaystyle \begin{array}{c}f(x)=\omega \varnothing (x)+b\\ {}\varnothing :{R}^n\to F,\omega \epsilon F\end{array}} $$
(10)

Where, high dimensional feature space has been denoted by ∅ and coefficient of ∅(x) are ω and b.

$$ {R}_{SVR}(c)={R}_{emp}+\frac{1}{2}{\left\Vert \omega \right\Vert}^2=c\times \frac{1}{n}\sum \limits_{i=1}^n{L}_{\varepsilon}\left({d}_{i,}{y}_i\right)+\frac{1}{2}{\left\Vert \omega \right\Vert}^2 $$
(11)

Where empirical risk has been denoted by Remp, Euclidian norms have been denoted by \( \frac{1}{2}{\left\Vert \omega \right\Vert}^2 \)the empirical error has been denoted by \( c\times \frac{1}{n}\sum \limits_{i=1}^n{L}_{\varepsilon}\left({d}_{i,}{y}_i\right), \) and empirical risk cost has been denoted by c.

$$ {L}_{\varepsilon}\left(d,y\right)=\left\{\begin{array}{c}\left|d-y\right|-\varepsilon \left|d-y\right|\ge \varepsilon \\ {}0\kern1.25em otherwise\kern1em \end{array}\right. $$
(12)

In Eq. (12) the loss function has been shown, and it is obtained from Eq. (11).

Minimize,

$$ {\displaystyle \begin{array}{c}{R}_{SVR}\left(\omega, {\xi}^{\ast}\right)=\frac{1}{2}{\left\Vert \omega \right\Vert}^2+c\sum \limits_{i=1}^n\left({\xi}_i+{\xi}_i^{\ast}\right)\\ {}{d}_i-\omega \varnothing \left({x}_i\right)-{b}_i\le \varepsilon +{\xi}_i\end{array}} $$
(13)

Subjected to,

$$ \omega \varnothing \left({x}_i\right)+{b}_i-{d}_i\le \varepsilon +{\xi}^{\ast }{\xi}^{\ast}\ge 0 $$

The testing error minimization has been shown in the Eq. (13), where slack variable for ups and downsides have been denoted by ξand \( {\xi}_i^{\ast } \).

3.3.3 Random Forest regression (RF)

Random Forest Regression (RF) comes from the family of ensemble learning. It is an ensemble learning-based regression model that has been used for predictive analysis. It was formulated in 1995 by TinKam Ho [54].

In Eq. (14) mathematical RF model has been shown

$$ \overline{h}(X)=\left(\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$x$}\right.\right){\sum}_{k=1}^Kh\left(x;{\theta}_k\right) $$
(14)

Where a large number of predictions have been ensured by k → ∞.

$$ {E}_{x,y}{\left(Y-\overline{h}(x)\right)}^2\to {E}_{x,y}{\left(Y-{E}_{\theta }h\left(X;\theta \right)\right)}^2 $$
(15)

The random forest prediction error has been defined in the Eq. 16, and the selection of \( p{E}_t^{\ast } \) has been defined in the Eq. 15 to protect the model from overfitting. The average error of an individual tree h(X; θ) has been shown in Eq. (16)

$$ p{E}_t^{\ast }={E}_{\uptheta}{E}_{x,y}{\left(Y-h\left(x;\theta \right)\right)}^2 $$
(16)

Here we adopt that all θ are unbiased for each tree, i.e.,

$$ EY={E}_xh\left(x;\theta \right) $$

Then,

$$ P{E}_f^{\ast}\le \overline{p}P{E}_t^{\ast } $$
(17)

Where weighted correlation has been defined by \( \overline{p} \) with independent of θ and θ′, lies between residuals y − h(x; θand y − h(x; θ).

3.3.4 Long short-term memory (LSTM)

The Long Short-Term Memory (LSTM), which is an evolutionary model of Recurrent Neural Network (RNN), was proposed by Hökreiter and Schmiduber [55]. It had been developed to deal with the deficiencies of RNN’s anterior with the help of additional interactions cell or module. In other words, we can say that the LSTM is a special type of RNN capable of remembering past information and learning long-term dependencies. Olah [56], quoted that the LSTM model can be organized as a series structure. It consists of four interaction layers instead of a single layer, such as in standard RNN. It also follows the unique communication method for interaction between the four segments. The basic structure of the LSTM model has been shown in Fig. 3.

Fig. 3
figure 3

LSTM Model Structure

The LSTM model with the mathematical formulation has been defined in the below equations:

The Input gate is calculated by Eq. 18:

$$ {i}_t=\sigma \left({W}_i{x}_t+{X}_i{h}_{t-1}\right) $$
(18)

Where sigmoid function has been denoted by σ, the output of the previous unit has been denoted by ht − 1, and Wi indicates the weight matrix.

The Forget gate is calculated by Eq. 19:

$$ {f}_t=\sigma \left({W}_f{x}_t+{X}_f{h}_{t-1}\right) $$
(19)

Where the output of the previous unit has been indicated by ht − 1, the sigmoid function has been denoted by σ, and Wf denotes the weight matrix.

The Output /Exposure gate is calculated by Eq. 20:

$$ {o}_t=\sigma \left({W}_o{x}_t+{X}_o{h}_{t-1}\right) $$
(20)

Where the sigmoid function has been denoted by σ, the output of the previous unit has been indicated by ht − 1, and Wo denotes the weight matrix.

The New memory cell is calculated by Eq. 21:

$$ \overset{\sim }{c_t}=\tanh \left({W}_c{x}_t+{X}_c{h}_{t-1}\right) $$
(21)

Where the output of the previous unit has been indicated by ht − 1, and Wc denotes the weight matrix.

The Final memory cell is calculated by Eq. 22:

$$ {c}_t={f}_t\times \overset{\sim }{c_{t-1}}+{i}_t\times \overset{\sim }{c_t} $$
(22)

Forget gate has been denoted by ft, \( \overset{\sim }{c_{t-1}} \), and \( \overset{\sim }{c_t} \) indicates cell states at time t − 1 and t respectively.

Thus, the final output is opted by,

$$ {h}_t={o}_t\times \tanh \left({c}_t\right) $$
(23)

Where,

ot and ct denotes output /exposure gate and final memory cell, respectively.

3.4 Statistical analysis

For the statistical analysis, two performance evaluation metrics have been taken: Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE). Based on these results, we can justify the performance, accuracy, and suitability of forecasting models. In this section, we have discussed the mathematical foundation of the evaluation metrics [57].

3.4.1 Root mean square error (RMSE)

The RMSE (Root Mean Absolute Error) is one of the critical statistical measures which is based on the concept of standard derivation for residuals. It is used for the validation of the predicted result. The measurement of distances between the regression line and data points is known as residuals mean error prediction. In the Eq. 24, the number of errors has been indicated by, the squares of errors have been denoted by \( {\left({\hat{y}}_i-{y}_i\right)}^2 \), \( {\hat{y}}_i \) & yi denotes the observed values and forecasted values, respectively.

$$ RMSE=\sqrt{\frac{1}{N}{\sum}_{i=1}^N{\left({\hat{y}}_i-{y}_i\right)}^2} $$
(24)

3.4.2 Mean absolute percentage error (MAPE)

The RMSE (Root Mean Absolute Error) is one of the significant statistical measures used to calculate the accuracy of the forecasted system. In the Eq. 25, the number of predicted values have been denoted by N, Xi & Yi indicates the predicted value and actual value, respectively.

$$ MAPE=\frac{1}{N}\sum \limits_{i=1}^N\left|\ \frac{Y_i-{X}_i}{Y_i}\right|\times 100 $$
(25)

4 Results

In this section, we are going to discuss the prediction result of the COVID-19 outbreak based on the five prediction models, i.e., Linear Regression (LR), Random Forest Regression (RFR), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), and our proposed Enhanced Multi-Task Gaussian Process Regression (MTGP).

4.1 Model identification and hyperparameter selection

Finding out the accurate forecasting model, which will be very useful for predicting the COVID-19 outbreak worldwide, is a very complex task. This study’s fundamental purpose is to build such a forecasting model that can correctly forecast the epidemic caused by COVID-19 across the world. The correct forecasting is an essential need because all the preventive measures depend upon the forecasting results. With the help of an accurate forecasting model, we can reduce the COVID-19 outbreak’s exposure by making the plan accordingly.

In Fig. 4, four traditional regression models and one proposed model i.e., Linear Regression (LR), Random Forest Regression (RFR), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), and Enhanced Multi-Task Gaussian Process Regression (MTGP) has been shown. These models have been used in the forecasting of the COVID-19 outbreak. This study aims to determine the suitability and accuracy of the proposed model among the traditional models.

Fig. 4
figure 4

Prediction Models a Quick Lookup

In Table 2, the hyperparameter setting, which has been used in the experimental evaluation, has been shown. This table consists of the information such as Prediction Model, Hyperparameter, Parameter Selection, and Best Hyperparameter Used. Before performing the experimental evaluation, we have hyper-tuned each model under the various selection criteria to determine the best parameter for the model. That best hyper-tuned parameter has been used in this experimental evaluation.

Table 2 Hyperparameter selection

4.2 Model performance evaluation on COVID-19 outbreak worldwide

Table 3 shows the result of four regression models based on the performance evaluation to forecast the COVID-19 outbreak worldwide. The forecasting result has been divided into five parts. In the first part, the forecasting of COVID-19 confirmed cases worldwide has evaluated. In the remaining part, the country-specific (China, India, Italy, and the USA) forecasting has been performed. China, India, Italy, and the USA has been taken for the prediction of COVID-19 confirmed cases. The forecasted result has been evaluated using two performance measures: MAPE (should be low) and RMSE (should be low). The performance of each model has been calculated using performance measures to find out the best suitable forecasting model among them. All the experiment has been performed using five selection or prediction criteria which are 1-day ahead, 3-day ahead, 5-day ahead, 10-day ahead, and 15-day ahead. The proposed MTGP model wins the battle with the lowest MAPE and RMSE throughout the experiment under different selection criteria (various countries).

Table 3 Performance evaluation of the prediction algorithms (confirmed cases)

5 Discussion

In the process of result finding, we have made the dataset with the help of the WHO health bulletin. Day to day bulletin report has been taken into consideration from the timestamp of 31/12/2019 to 25/06/2020. The data is consisting of approximately 213 affected countries with the country/territory/area name, date, the total number of confirmed cases, the total number of deaths, the total number of new confirmed cases, total number new deaths, the total number of recovered patients, the total number of active patients, and transmission classification. The default settings have been used in four machine learning models i.e., Linear Regression (LR), Random Forest Regression (RFR), Support Vector Regression (SVR), and Long Short-Term Memory (LSTM). But in the MTGP model, the improved kernel matrix has been used. We followed the same testing and training mechanism for the prediction of time series as we were using in the traditional GPR model, excluding the kernel matrix. In the proposed MTGP forecasting model, the dataset is given at the input side, and in the output side, the historical and reference series is received.

In Figs. 5 and 6, the performance measures-based prediction outcomes of China, India, Italy, USA, and worldwide using various forecasting model has been shown. Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) have been used as performance measures. The performance of each model has been calculated using these performance measures to determine the best suitable forecasting model among them. All the experiment has been performed using five selection or prediction criteria: 1-day ahead, 3-day ahead, 5-day ahead, 10-day ahead, and 15-day ahead. With the help of these measures results, we have found that our proposed model’s performance is better than all other models. The proposed model achieves the lowest MAPE and RMSE throughout the experiment under various selection criteria (multiple countries).

Fig. 5
figure 5

Bar Graph of the Results based on Mean Absolute Percentage Error (MAPE) of Confirmed Cases: a Worldwide b China c India d Italy e USA

Fig. 6
figure 6

Bar Graph of the Results based on Root Mean Square Error (RMSE) of Confirmed Cases: a Worldwide b China c India d Italy e USA

In Figs. 7, 8, 9, 10 and 11, on the y-axis, the number of infected patients has been taken, and, on the x-axis, the date has been represented. In this experiment, the 15-days advance forecasting of confirmed cases Worldwide, China, India, Italy, and the USA, have been performed, and their results have been plotted in Figs. 7, 8, 9 and 10, respectively. Four forecasting models have been used to perform the experiments. The models we have taken are LR, SVR, RF, LSTM, and proposed MTGP model, and based on these models’ results; the graph has been plotted. The prediction results for Worldwide has been presented in Fig. 7a to e. The prediction results for China, India, Italy, and the USA has been shown in the graph 8(a) to 8(e), 9(a) to 9(e), 10(a) to 10(e), and 11(a) to 11(e) respectively. All the models simulate the prediction of the COVID-19 outbreak relatively well. In the context of performance, the proposed enhanced MTGP model performs better among all other forecasting models.

Fig. 7
figure 7

Graph of 15-Days Advance Forecasting of Confirmed Cases in COVID-19 Outbreak for Worldwide: a LR Model b SVR Model c RFR Model d LSTM Model e Proposed MTGP Model

Fig. 8
figure 8

Graph of 15-Days Advance Forecasting of Confirmed Cases in COVID-19 Outbreak for China: a LR Model b SVR Model c RFR Model d LSTM Model e Proposed MTGP Model

Fig. 9
figure 9

Graph of 15-Days Advance Forecasting of Confirmed Cases in COVID-19 Outbreak for India: a LR Model b SVR Model c RFR Model d LSTM Model e Proposed MTGP Model

Fig. 10
figure 10

Graph of 15-Days Advance Forecasting of Confirmed Cases in COVID-19 Outbreak for Italy: a LR Model b SVR Model c RFR Model d LSTM Model e Proposed MTGP Model

Fig. 11
figure 11

Graph of 15-Days Advance Forecasting of Confirmed Cases in COVID-19 Outbreak for the USA: a LR Model b SVR Model c RFR Model d LSTM Model e Proposed MTGP Model

5.1 Significance of IoT in the detection of COVID-19

The bitter truth about the technologies is that we cannot replace humans with the technology because the human is enriched with decision-making authority and technologies have limited power. However, it is also a truth that humans can opt for the latest technology for making their life simple and smooth. The keywords which will define IoT for everything are efficiency, convenience, and automation. Emerging of IoT technology is a game-changer not only in the industrial perspective but also in the healthcare domain because it connects to various heterogeneous devices using wired or wireless connections and sends or receives the data to cloud base stations. The beginning of IoT in healthcare has been seen, such as the use of smart sensors, remote monitoring of patients, and the integration of various medical devices. Apart from this, the use of applications like wearable biometric sensors, activity recognition via sensors, medication dispensers, glucose monitors, and smart beds gives the wing to IoT technology and emergence to the IoHT (Internet of Healthcare Things). The significance of the IoT devices in disease detection has been discussed below.

5.1.1 In directly turning data into action

Data should always ready into action, and if we are using the IoT device for health monitoring, so we don’t have to worry about the data at an epidemic stage. In the case of the COVID-19 outbreak, we face data-related issues because we don’t have the correct numbers about infected patients across the world. We have some data where such smart healthcare infrastructure has been developed, but it is very lesser. We are only dependent on the WHO, which is somehow managing informative data of all countries with respect to current scenarios. That is one of the most significant issues to date. Thus, neither can we control the virus spread rate nor make an efficient prevention plan. If we have such infrastructure in the future, we will never suffer like this, because we will have enough data and could make plans accordingly.

5.1.2 In improving the health of patients

With the help of wearable IoT devices, humans can track their health and take precautions to improve their health. If we talk about the significance of IoT devices in detecting the COVID-19 outbreak, we can follow each ill person and take appropriate action to minimize its spread. Suppose we have collected data that are coming from various tracking devices. In that case, we can identify the maximum exposed area in terms of disease spread as well as the people who are escaping from the hospitals or isolation wards can also be tracked. We can also keep on eyes at the suspects. So overall, with the help of IoT devices, we can efficiently control the exposure rate as well as improve the health of patients.

5.1.3 In the promotion of preventative measures

With the help of IoT infrastructure, we can circulate the prevention method for the public and aware of them with its consequences and effects on human health. As we know that IoT devices may generate the data in real-time thus, we can share this information or data among the mobs so they can keep track of the epidemic situation. Apart from it, we can warn people not to go in such areas where the maximum exposure has been being taken place. Thus, with the help of a real-time system like IoT for healthcare, we can promote and spread awareness among people. We can also guide them on how to deal with epidemic situations.

5.1.4 In the advancements of management strategies for healthcare

Due to the COVID-19, the people who are suffering from respiratory diseases need extensive care. Thus, with the help of a health dataset, we can identify and track the persons who are having such problems. The prevention strategies can be made using the real-time data of disease spread and applying efficient management strategies to deal with it. It was earlier not possible because we didn’t have real data. So, advanced management strategies need real-time data to deal with any epidemic diseases like COVID-19. That is possible only if, when we are using IoT infrastructure for the healthcare system.

5.2 Person who needs extensive care from COVID-19

As we know, the COVID-19 is a virus-based contagious disease that has been widely spread nowadays. It can affect all the age group person but deadliest for the persons who are suffering from the below-mentioned diseases. Such a person needs extensive care from COVID-19. In Table 4, the persons who need extensive care has been discussed [11, 58, 59].

Table 4 The person who needs extensive care

5.3 Possible solution

For the detection of novel infectious diseases by existing resources and definition is the step by step process. The real-time data and their analytics are needed for taking preventive measures. With the help of accurate information and quick action, we can reduce the overall impact of the novel diseases worldwide, which can also reduce the social and economic factors of countries. The real-time tracking and detecting of the novel diseases are an arduous task, but by using the IoT, it is possible too. IoT can play a vital role in detecting and controlling the outbreak of virus-based infectious diseases in real-time.

5.3.1 Real-time tracking of disease spread

The emergence of technologies such as big data and IoT in healthcare gives researchers an area to work on. With the help of IoT, we can get the data from the various places in real-time where earlier, it was either possible thru manually or not possible at all. The device like a smart thermometer and smart thermal cameras plays a vital role in the people’s initial screening so that we can detect the people suffering from COVID-19. Intelligent devices such as thermal cameras can be installed at every place like airports, railways, roadways, and home doors. Thus, we can easily identify the healthy person and ill person. All systems should be integrated with the base stations (not in a case of home automation), where we can monitor the data and plan the desired action accordingly. These things are real, which means the data has been collected and analyzed in real-time. Thus, with the help of IoT based systems, the real-time tracking of the disease spread rate and spread density can be achieved. As we know, the IoT is an interconnection of heterogeneous devices or systems, and with the help of this, we can also collect the data from remote areas. Integrating these data globally for participating in the global health system will not only help in the real-time tracking of the diseases but also in the prediction mechanism for the prevention of disease spread.

5.3.2 IoT based responding and monitoring for public health

The use of IoT can change individuals’ lives by integrating smart clinics. Smart clinics mean the clinics which are using the various IoT devices and collecting the data in real-time. It can also communicate among the other smart clinics and also share information for medical studies in real-time. These smart clinics are connected to the central server, where all the data is being collected and monitored in real-time. All the records about the patient should be available at the primary database so, it will easy for the government to take the preventive measure when the epidemic situation comes into the picture. The primary aim of such infrastructure is to enable IoT devices and enable the associated health clinics to manage public health crises. It will also be helpful in detecting suspicion outbreaks as well as in the detection of a confirmed outbreak caused by virus-based diseases. In the situation of growing suspicion of an explosion, it can use the IoT devices to obtain focused data by locating the epidemic source. Similarly, the same network can be used for providing medication and health services in the affected areas.

5.3.3 Employing efficient mechanisms for prevention of infection

The lack of data availability and relevant data is a big hurdle toward making prevention policies. Data is one of the essential requirements for making appropriate approaches and test the hypothesis. It also plays a vital role in controlling infectious diseases. With the help of IoT and advanced technology in the proficient healthcare system, we can overcome these limitations or challenges. With the help of IoT based systems, we can collect the data from various locations and resources. After that, these data are sent into the central health server, which will serve as an input to the researcher for the analysis. Based on the analysis result, we can find the impact of the outbreak and make the preventive roadmap accordingly to minimize the overall effect. As we know, the whole world is suffering from the COVID-19, where preventive measures are significant to minimize the overall impact. With the help of IoT based smart system, we can easily find highly infected areas. Based on IoT-based intelligent systems, we can make effective solutions such as where to place the sanitizing points, which area should be a lockdown, and where we need massive health support.

6 Conclusions

In an epidemic situation, any small finding can help a lot. In this challenging pandemic situation, the forecasting of the COVID-19 outbreak plays an important role. It gives an idea about its widespread in upcoming days, which will help the government to take preventive measures for minimizing its spread. Correct and efficient forecasting of outbreaks in any epidemic situation is a very complex but novel task. It becomes more complicated when preventive measures depend upon the prediction, and we don’t have real-time data. In this paper, a COVID-19 outbreak forecasting method has been proposed to deal with all limitations compared to other traditional models. For the experimental evaluation, we have compared the performance of four traditional forecasting models, i.e., Linear Regression (LR), Random Forest Regression (RFR), Support Vector Regression (SVR), and Long Short-Term Memory (LSTM) with our proposed MTGP forecasting model to find out its suitability and correctness. Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) have been used as performance measures. The performance of each model has been calculated using these performance measures to determine the best suitable forecasting model among them. All the experiment has been performed using five selection or prediction criteria: 1-day ahead, 3-day ahead, 5-day ahead, 10-day ahead, and 15-day ahead. With the help of these measures results, we have found that our proposed model’s performance is better than all other models. The proposed model achieves the lowest MAPE and RMSE throughout the experiment under various selection criteria. Apart from that, we have found the significance of IoT in healthcare as well as the importance of IoT for COVID-19 detection and IoT based possible solutions for minimizing the impact of the COVID-19.

In the future, we again analyze the proposed model with different datasets and find out the further boosting techniques which can boost the model efficiency. The Sparse Gaussian Process method is one of the probable solutions.