Next Article in Journal
Flow Increment through Network Expansion
Next Article in Special Issue
A Dynamic Model of Cytosolic Calcium Concentration Oscillations in Mast Cells
Previous Article in Journal
Evolutionary Derivation of Runge–Kutta Pairs of Orders 5(4) Specially Tuned for Problems with Periodic Solutions
Previous Article in Special Issue
Mathematical Study for Chikungunya Virus with Nonlinear General Incidence Rate
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Nonlinear Combinational Dynamic Transmission Rate Model and Its Application in Global COVID-19 Epidemic Prediction and Analysis

1
School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
2
School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(18), 2307; https://doi.org/10.3390/math9182307
Submission received: 24 August 2021 / Revised: 11 September 2021 / Accepted: 16 September 2021 / Published: 18 September 2021
(This article belongs to the Special Issue Mathematical Modeling and Analysis in Biology and Medicine)

Abstract

:
The outbreak of coronavirus disease 2019 (COVID-19) has caused a global disaster, seriously endangering human health and the stability of social order. The purpose of this study is to construct a nonlinear combinational dynamic transmission rate model with automatic selection based on forecasting effective measure (FEM) and support vector regression (SVR) to overcome the shortcomings of the difficulty in accurately estimating the basic infection number R 0 and the low accuracy of single model predictions. We apply the model to analyze and predict the COVID-19 outbreak in different countries. First, the discrete values of the dynamic transmission rate are calculated. Second, the prediction abilities of all single models are comprehensively considered, and the best sliding window period is derived. Then, based on FEM, the optimal sub-model is selected, and the prediction results are nonlinearly combined. Finally, a nonlinear combinational dynamic transmission rate model is developed to analyze and predict the COVID-19 epidemic in the United States, Canada, Germany, Italy, France, Spain, South Korea, and Iran in the global pandemic. The experimental results show an the out-of-sample forecasting average error rate lower than 10.07% was achieved by our model, the prediction of COVID-19 epidemic inflection points in most countries shows good agreement with the real data. In addition, our model has good anti-noise ability and stability when dealing with data fluctuations.

1. Introduction

Coronavirus disease 2019 (COVID-19), which was first reported in Hubei, China at the end of 2019, has spread globally during 2020. The World Health Organization (WHO) renamed this disease as COVID-19 in 11 February 2020 and made the assessment that COVID-19 can be characterized as the third pandemic caused by coronavirus, after SARS-CoV (2002) and MERS-CoV (2012) [1]. According to a report from Johns Hopkins University, by 18 July 2021, there were 183.82 million confirmed cases and 3.97 million deaths from COVID-19 worldwide. More than 190 countries have confirmed cases of COVID-19, and the cumulative number of confirmed cases in eight countries exceeds 500,000, these being the United States (USA), India, Brazil, France, Russia and Turkey, and four countries (USA, India, Brazil and Mexico) have more than 200,000 deaths. There is no doubt that COVID-19 has brought a serve threat to the health and livelihoods of people all over the world. In the face of the wanton spread of the epidemic, it is important to effectively discover the internal dynamic laws of the epidemic and build an effective epidemic model to analyze and predict the spread of the disease. Implementation of such a model can minimize the impact of the epidemic on the community economy and help countries around the world take reasonable epidemic prevention and control measures.
The prediction of an epidemic refers to the application of dynamical, statistical, and other models to estimate the trend of the number of future cases, this being the compass of epidemic prevention and control. According to the prediction principle, the existing COVID-19 epidemic models can be divided into the dynamical models based on mechanistic analysis and the statistical models based on actual data. Among the dynamical models, the SIR model, the SEIR model, and their extensions can reflect the intrinsic dynamical characteristics of the spread of infectious diseases and can better reproduce the development process of diseases, reveal the epidemic law, and predict the change trend, so they are favored by researchers. Yang et al. [2] used the improved SEIR model to predict and analyze the epidemic under the prevention and control policies, and the experimental results showed that strong prevention and control policies could effectively inhibit the spread of the epidemic. Wu et al. [3] predicted the global spread of COVID-19 using an improved SEIR model. Lopez et al. [4] used the modified SEIR model to predict the COVID-19 outbreak in Spain and Italy. Ashutosh et al. [5] estimated pure asymptomatic infection cases based on a SEIR model. The results showed that infection may last for a long time without a vaccine. Tang et al. [6] proposed that the risk of secondary outbreaks can be effectively reduced by intermittent population mobility and effective isolation of infected people in the floating population based on a novel stochastic discrete transmission model. In the meantime, the data-driven statistical models were also widely used in the prediction and analysis of COVID-19 epidemics, including function fitting models [7,8,9], machine learning [10,11], deep learning [12,13], and time series models [14,15].
It is well-known that the basic infection number R 0 plays an important role in such dynamical models. However, it is difficult to accurately estimate this parameter due to the fact that it can vary depending on a number of factors [16]. To overcome this difficulty, Huang et al. [17] proposed a data-driven, simple calculation of the dynamic propagation rate to replace R 0 based on the law of natural growth. Subsequently, Hu et al. [18] used a power function to fit the dynamic transmission rate to predict and analyze epidemic in China. Hu et al. [19] developed a dynamic growth rate model (DGRM) to predict and analyze epidemic at abroad. Both of studies used a single prediction model that could not be adjusted adapted according to the actual situation. Inspired by existing work [17,18,19], Xie et al. [20] proposed an nonlinear combinational dynamic transmission rate model based on support vector regression (SVR) to analyze the epidemic situation of major cities in China. Their work improved the problem of insufficient prediction accuracy of single model, but the sub-model selection method was manual selection, which lacked interpretability, and the selection method needed to be strengthened. In addition, in the face of complex epidemic data abroad, the results of using a single kernel function for research are often unsatisfactory.
In the present work, we propose an improved nonlinear combinational dynamic transmission rate model (INCDTRM), that can automatically select the optimal sub-model based on forecasting effective measure (FEM) and SVR. We employ the model to predict the existing cases and inflection points of COVID-19 for USA, Canada, Germany, Italy, France, Spain, South Korea, Iran and the global epidemic, and we present a comparative case analysis. The results show that such a combinational model is able to predict new cases with very high efficiency, in some countries above 95%. This study may help in understanding the development trend of COVID-19 and the effectiveness of mitigation measures.
The remainder of this article is organized as follows. Section 2 introduces the dynamic transmission rate, and explains the use of SVR and FEM. Section 3 discusses our simulations and empirical studies, including dataset description, choice of sliding window period, weight and parameters setting in SVR, comparative analysis of different models, inflection point prediction, sensitivity analysis and the global epidemic forecast. Some conclusions are drawn in Section 4.

2. Methodology

2.1. Dynamic Transmission Rate

Let N ( t ) = L ( t ) K ( t ) D ( t ) be the existing number of cases at time t, where L ( t ) , K ( t ) , and D ( t ) are cumulative confirmed cases, cumulative cures and cumulative deaths, respectively. Then, we have
N ( t + 1 ) N ( t ) = q ( t ) N ( t ) ,
where q ( t ) is the growth rate of the number of existing infections at time t. It follows from Equation (1) that
q ( t ) = N ( t + 1 ) N ( t ) 1 .
Considering that only taking the number of existing cases in two adjacent days to calculate the dynamic growth rate is not robust and is vulnerable to data fluctuations, a sliding window period k is introduced into Equation (1) to obtain the following new dynamic growth rate, i.e.,
q ( t ) = N ( t + 1 ) N ( t k + 1 ) 1 / k 1 ,
which is similar to the average growth rate. It follows that q ( t ) > 1 . Equaiton (3) is the geometric average of the ratio of the number of existing infections during a sliding window period, and the calculated results will be more robust. The corresponding dynamic growth rate sequence becomes smoother with the increasing of the sliding window period.
To facilitate the calculation, the discrete value of dynamic transmission rate is given by
h t = q ( t ) + 1 .
where h t is non-negativity. It should be noted that when the h t is close to 0, which means the epidemic has basically ended, that is, the cumulative confirmed cases are not changing again. In addition, we define this special case, i.e., h t = 1 , as the inflection point of the epidemic [18]. We record the farthest day of the dataset as 1 and the nearest as T, so it follows that t [ k ,   T ] .
According to the discrete value of the dynamic transmission rate h t , selecting an effective function f ( t ) to fit h t is the key to accurately predicting a COVID-19 outbreak. Some well-known fitting functions f ( t ) in Table 1 are considered in this paper.
It should be noted that the parameters β 1 and β 2 of f 1 ( t ) can respectively reflect the severity of the initial epidemic and the degree of human interference in the development of the epidemic, including the effectiveness of preventive measures and the abundance of medical resources. For the three-parameter logarithmic function f 2 ( t ) , the parameters β 1 and β 2 have similar role to the corresponding parameters in fitting function f 1 ( t ) , and β 3 can reflect the number of people infected in the initial epidemic. However, for the fitting functions f 3 ( t ) and f 4 ( t ) , the parameters of f 3 ( t ) can be positive or negative, and the parameters of f 4 ( t ) fluctuate strongly. In fact, the parameters of f 1 ( t ) and f 2 ( t ) have strong interpretability, and their prediction abilities are poor, while the parameters of f 3 ( t ) and f 4 ( t ) have poor interpretability, and their prediction abilities are strong (shown in Section 3.4).
Therefore, we use the functions f i ( t ) ( i = 1 , 2 , 3 , 4 ) to fit the discrete value of dynamic transmission rate h t and build a nonlinear weighted least squares model as follows:
arg   min t = 1 n θ t [ f i ( t ) h t ] 2 , i = 1 , 2 , , 4
where θ t = exp { 0.1 ( T t ) } is the weighting function. By solving the above model, we can obtain the values of the unknown parameters in f i ( t ) ( i = 1 , 2 , 3 , 4 ) , so that the concrete form of h t can be obtained.
In order to make full use of the data during the optimal sliding window period, the number of existing cases is predicted by
N ^ ( t ) = 1 k i = 1 k N ( t i ) × ( h ^ t 1 ) i ,
where h ^ t 1 and N ^ ( t ) represent the predicted value of dynamic transmission rate at time t 1 and the estimated numbers of existing cases at time t, respectively. When N ( t i ) is unknown, we can use N ^ ( t i ) instead of it.

2.2. Support Vector Regression

The support vector machine (SVM) is one of the most important predictive models; it has been widely used in various fields and has achieved great success. In addition, it has the advantages of fast learning, global optimization and strong generalization ability. In particular, SVR is an important application of SVM. The main idea is to introduce an ε -insensitive loss function in SVM to adapt to the regression problem and use a kernel function to map the sample set to the high-dimensional feature space to achieve nonlinear regression. Due to its powerful fitting and generalization abilities, SVR has been widely used in various fields, such as in industry [21,22] and atmospheric field [23,24].
Let D = { ( x i , y i ) } i = 1 N be the learning sample set, where x i R n and y i R represent the input values and the corresponding desired output, respectively, and N is the number of samples. The general form of the regression function of SVR can be formulated by:
d ( x ) = ω · x + b
where ω R n and b stand for the weight vector and the offset, respectively. The original problem of SVR can be transformed into the following optimization problem:
  m i n ω , b , ξ i * , ξ i 1 2 ω 2 + C i = 1 N ( ξ i * + ξ i )       s . t .     ( ω · x i ) + b y i ε + ξ i ,   y i ( ω · x i ) b ε + ξ i * ,   i = 1 , 2 , , N   ξ i , ξ i * 0 ,
where slack variables ξ i , ξ i * make the SVR tolerate the noises or errors and C is a pre-set penalty factor. Though constructing a Lagrangian function and using the KKT conditions, the entire problem can be formulated as the following optimization problem:
m i n i s i z e α i , α i * 1 2 i , j = 1 N ( α i * α i ) ( α j * α j ) ( x i · x j ) + ε ( α i * + α i ) y i i = 1 N ( α i * α i )     s . t .     i = 1 N ( α i * α i ) = 0 ,   0 α i , α i * C ,
where α i and α i * are Lagrange multipliers.
Let ( α ^ T , ( α ^ * ) T ) = ( α ^ 1 , , α ^ N , α ^ 1 * , , α ^ N * ) be the optimal solution of the optimization problem (9). Then the decision function can be expressed as follows:
d ( x ) = i = 1 N α ^ i α ^ i * x · x i + α ^ 0 ,
where α ^ i , α ^ i * [ 0 , C ] and α ^ 0 is a scalar in d ( x ) called bias term. More discussions of SVR and SVM can be found in the work by Vapnik et al. [25,26] and references within.
It is well-known that the performance of an SVR model is mainly dependent on the so-called kernel functions; these include the linear kernel function, the polynomial kernel function, the Gaussian kernel function, and so on. Peng et al. [27] reported that overfitting for the Gaussian kernel function and the linear kernel function had the worst in-sample but the best out-of-sample performance when predicting the epidemic pattern. Therefore, we combine these two kernel functions linearly, and the expression of the combined kernel function is given by
K x i , x j = λ exp ( γ · x i x j 2 ) + ( 1 λ ) [ γ · ( x i · x j ) + 2 ] ,
where λ is the weight coefficient, γ represents the kernel parameter, and x i and x j represent the i-th and j-th samples, respectively.
SVR has many advantages in measuring tolerance error, solving nonlinear and high dimensional pattern recognition, and then use these to improve the forecasting. Also, it is advantageous to forecast COVID-19 cumulative cases, especially when the sample sizes are small [28]. In this paper, SVR is used to combine multiple single models nonlinearly, i.e., multiple fitting functions, to construct a nonlinear combination dynamic transmission rate model.

2.3. Forecasting Effective Measure

FEM was proposed by Chen in 2001 [29] as an indicator to measure the effectiveness of a model; this can be used to select the optimal sub-models to build a combination forecasting model. In addition, FEM represents the average and comprehensive accuracy of the model, which can be described by the mean of accuracy and the standard deviation reflecting the degree of dispersion.
Let S = { ( h t , h ^ i t ) } t = k T ( i = 1 , 2 , 3 , 4 ) be a learning sample, and
a i t = 1 | ε i t | , | ε i t | 1 0 , | ε i t | > 1 , i = 1 , 2 , 3 , 4 , t [ k , T ] ,
where h t is the actual value at time t and ε i t = ( h t h ^ i t ) / h t is the relative error of the i-th model at time t. Then the expression for FEM of the i-th model M i is provided below, i.e.,
F E M ( M i ) = E ( M i ) ( 1 σ ( M i ) ) .
where E ( M i ) and σ ( M i ) represent the mathematical expectation and the standard deviation of the i-th model M i , respectively, which are defined by
E ( M i ) = t = k T θ t t = k N θ t a i t
and
σ ( M i ) = t = k T θ t t = k T θ t ( a i t E ( M i ) ) 2 1 / 2 ,
where the θ t is same in Equation (5).
The fundamental purpose of the combined forecasting is to add additional single models to the existing model to improve the forecasting effect. If the new model cannot improve the accuracy of the combined model (a so-called “redundant” model), then we eliminate it. The algorithm for selecting the optimal sub-model employed in this paper is given in Algorithm 1 below.
Algorithm 1 Selecting the optimal combined model based on FEM and SVR
Input: Sub-models M i ,( i = 1 , 2 , 3 , 4 )
Output: The optimal combined model M c
1:
begin
2:
Evaluate F E M ( M i ) using Equation (13), i = 1 , 2 , 3 , 4 .
3:
Sort F E M ( M i ) : F E M ( M ( 1 ) ) F E M ( M ( 2 ) ) F E M ( M ( 3 ) ) F E M ( M ( 4 ) ) .
4:
Let F E M M A X F E M ( M ( 1 ) ) , M c M ( 1 ) .
5:
for i = 2 4 do
6:
     Get the corresponding estimated dynamic transmission rate by M c and M ( i ) .
7:
     Call SVR, take the estimated and the real dynamic transmission rate as the training input and the desired output of SVR to construct a combination model M c & ( i ) , and evaluate F E M ( M c & ( i ) ) using Equation (13).
8:
     if  F E M ( M c & ( i ) ) F E M M A X  then
9:
           Let F E M M A X F E M ( M c & ( i ) ) , and M c M c & ( i ) .
10:
     end if
11:
end for
12:
Calculate the number l e n ( M c ) of sub-models in M c .
13:
if l e n ( M c ) = 1 then
14:
     Let ( M c ) m a x { F E M ( M ( 1 ) & ( 2 ) ) , D ( M ( 1 ) & ( 3 ) ) , F E M ( M ( 1 ) & ( 4 ) ) } .
15:
end if
16:
return M c

2.4. Process Steps

The forecasting framework of INCDTRM is carried out in seven main stages, that are shown in Figure 1 and explained below.
Input Cumulative confirmed cases L ( t ) , Cumulative deaths D ( t ) , Cumulative cures K ( t ) , t = 1 , 2 , , T .
Output Estimates of the existing cases N ^ ( t ) , t = T + 1 , T + 2 , .
Step 1. Calculating the existing cases N ( t ) = L ( t ) K ( t ) D ( t ) , t = 1 , 2 , , T .
Step 2. Choosing the best sliding window period k.
Step 3. Calculating h t using Equations (3) and (4), and divided the result into two parts. 3/4 sub-sets are used as the training set where time t = k , k + 1 , , m and the remaining sub-sets are devoted as the validation set, t = m + 1 , , T .
Step 4. Fitting the training set based on four fitting functions f i ( t ) ( i = 1 , 2 , 3 , 4 ), and getting the estimated dynamic transmission rate h ^ i t , t = [ k , T ]
Step 5. Introducing h ^ i t ( t [ k , m ] ) into SVR, and using SVR to estimate the dynamic transmission rate h ^ i t ( t [ k , T ] ) corresponding to the validation set. Then selecting the optimal combined model based on FEM and SVR by Algorithm 1.
Step 6. Predicting the dynamic transmission rate from the optimal combined model M c , and calculating the estimated number of existing cases using Equation (6) after the T-th periods, respectively.
Step 7. The performance of the model is determined by the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE), which are given by
M A E = 1 T t = 1 T | N ( t ) N ^ ( t ) | ,
R M S E = 1 T t = 1 T ( N ( t ) N ^ ( t ) ) 2
and
M A P E = 1 T t = 1 T N ( t ) N ^ ( t ) N ( t ) × 100 % ,
respectively, where N ^ ( t ) and N ( t ) are the estimated numbers of existing cases and actual numbers of existing cases, respectively.
It should be noted that in Step 6, training set and validation set are used to train the SVR and select the corresponding model parameters, respectively. Therefore, in the calculation of FEM, we consider the fitting ability and prediction ability of the model, which effectively avoids the overfitting of the model.

3. Results and Discussion

All our numerical experiments are carried out on a PC with AMD Ryzen 7 4800u CPU at 1.80 GHz and 16 GB of physical memory. The PC runs Python Version: 3.7.2 on Window 10 Enterprise 64-bit operating system, and the nonlinear fitting model and the SVR regression model are imported from the SVM class of sklearn python library and leastsq class of scipy python library.

3.1. Dataset Description

The COVID-19 data repository (https://github.com/CSSEGISandData/COVID-19, accessed on 17 September 2021) used in the study was obtained from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) [30]. In this paper, we consider the following countries: USA, Canada, Germany, Italy, France, Spain, South Korea and Iran. The cumulative confirmed cases and deaths in each country, as well as the period from the first and last reports, are listed in Table 2.
17 May 2020 is the end date of the first wave of the epidemic in most countries. When sliding window period is set to 7, the resulting dynamic transmission rates of the eight countries are shown in Figure 2.
Figure 2 shows that most countries experienced three stages, including a slow growth in the early stage, a rapid increase in the middle stage and a slow decline in the later stage. The experimental objects of this paper are mainly the third stage of overseas countries, i.e., the data after the dynamic transmission rate reaches the peak, and the subsets before 28 April 2020 are used to train the model to predict cases from 28 April 2020 to 17 May 2020. The data is normalized before training SVR and the normalized formula is given by
h i t * = h ^ i t min ( h ^ i t ) max ( h ^ i t ) min ( h ^ i t ) .

3.2. Choice of Sliding Window Period

The best sliding window period can effectively improve the prediction ability of the model. The basic idea of selecting the best sliding window period employed in this paper is to calculate the average mean absolute error (AVG_MAE) of four single models under different sliding window periods, and then determine the best sliding window period k according to the minimum AVG_MAE. The algorithm used for selecting the sliding window period in this paper is given in Algorithm 2 below.
Algorithm 2 Selecting the best sliding window period
Input: Existing cases N ( t ) , t = 1 , 2 , , T
Output: The best sliding window period k
1:
begin
2:
Divide the data into two parts, a training set where t = 1 , 2 , , N 7 and a testing set where t = T 6 , T 5 , , T .
3:
for k = 1 7 do
4:
    Evaluate the dynamic transmission rate using Equations (3) and (4) with the training set.
5:
    Built four prediction models using Equation (5).
6:
    Predict the dynamic transmission rate using four single models f i ( t ) , i = 1 , 2 , , 4 , and evaluate the estimated values of existing cases using (6), t = T 6 , T 5 , , T .
7:
    Evaluate the MAE of the four models using Equation (16), and calculate AVG_MAE.
8:
end for
9:
return min(AVG_MAE) → best k
Using Algorithm 2, we obtain the best sliding window periods for eight countries (Table 3). The results show that the best sliding window periods for North American and European countries are larger than those for Asia. The result is attributed to the sliding window period being capable of effectively suppressing data fluctuations due to the large numbers of existing cases in North American and European countries.

3.3. Weight and Parameters Setting in SVR

Reasonable parameters for SVR can effectively improve the model’s fitting and predictive ability. In this paper, we use the grid search to select the weight λ of the combined kernel function, and Figure 3 shows the average MAPE of the epidemic prediction in eight countries under different weights λ .
It can be seen from Figure 3 that with the increase of weight, the average MAPE shows a decreasing trend, and when the weight is 0.9, the predicted average MAPE takes the minimum value, so the weight of the combined kernel function is set to 0.9 in this paper.
In addition, the particle swarm optimization (PSO) algorithm is used to search the optimal parameters for SVR. The initial values of the parameters in the PSO algorithm are set as shown in Table 4 and the results are displayed in Table 5.

3.4. Comparative Analysis of Different Models

To verify the validity of each model, we present the experimental results in detail in this subsection; at the same time, we also compare the performance results of our proposed method with the six different regression models, including four single methods, DGRM [19], and nonlinear combinational dynamic transmission rate model (NCDTRM). In Table 6, we present the predictive performance on the testing set of seven models (from 28 April 2020 to 17 May 2020). Table 6 shows that INCDTRM achieves the best results in multiple countries, followed by NCDTRM, and both combinational models obtain better prediction performance than single models. The MAPEs of INCDTRM are calculated as 1.20%, 3.11%, 7.81%, 3.97%, 8.66%, and 4.42% for USA, Canada, Germany, Italy, Spain and South Korea, respectively. These results illustrate the accuracy of INCDTRM in estimating the number of existing COVID-19 daily cases. Among them, the MAE and RMSE of INCDTRM are the smallest for Germany, but the MAPE is larger than that of the model f 4 ( t ) . This is because when INCDTRM is used to predict the epidemic in Germany, the date with the largest number of existing cases is relatively accurate, but the date of the smallest number of existing cases has a larger prediction error.
Although the prediction accuracy of INCDTRM is not as good as that of SVR when predicting the epidemic pattern in USA and Iran, the overall accuracy of INCDTRM is better than that of the other six models, and the AVG_MAPE of INCDTRM is 10.07%, indicating the reliability and feasibility of the FEM in selecting the optimal sub-model.
It should be noted that when forecasting the epidemic pattern in France and Iran, all models perform very poorly. This situation occurs because the number of confirmed cases in these two countries has unpredictably increased and second wave of the epidemic has appeared. Moreover, the fitting functions used in this paper are all monotonically decreasing functions. If the predicted country has a second wave of the epidemic, i.e., the dynamic transmission rate increases instead of decreasing, then all models in this paper will fail. This is the defect of this type of model.

3.5. Inflection Point Prediction

In this subsection, we estimate the inflection points of the epidemics in these countries. The definition of the inflection point is that the number of existing cases does not increase until it starts to decline, i.e., the dynamic transmission rate is equal to 1 [18]. The epidemic has the following properties around the inflection point: Before the inflection point, the epidemic is at an increasing stage; at the inflection point, the epidemic has the greatest pressure on the prevention and the medical system; after the inflection point, the epidemic situation has eased, and the pressure on prevention and control has gradually reduced. The emergence of an inflection point indicates that the number of existing infections has reached the maximum and has since been declining, which means that the epidemic is turning from bad to good. Using this model to estimate the inflection point of the epidemic can provide a reference value for understanding the epidemic spreading trend and can guide the timely release of relevant policies. The results are displayed in Table 7 and Figure 4.
Figure 4 shows that there is a delay between the estimated inflection point and the actual inflection point. This due to the fact that the model introduces the sliding window period k, which delays the transmission and updating of information, and the delay increases with the size of the sliding window period. However, most of the inflection point estimation errors are within one week.
Figure 4a,b show that the epidemic spread in North American countries is relatively late. The fitting curve of USA is relatively flat, but the change in the dynamic transmission rate from 22 April 2020 to 27 April 2020 is very small, and the decline in the curve is not obvious in the subsequent prediction, which is approximately a straight line, and resulting in a backward estimate of the inflection point. The outbreak in Canada fluctuate considerably, but overall shows a downward trend, and the inflection point is estimated in 26 May 2020.
Figure 4c–f show that European countries do not pay enough attention to the epidemic in the early stages, and the epidemic spread more rapidly, meaning that the dynamic transmission rate is larger. Then, with the subsequent strengthening of the prevention and control measures, the epidemic trend improves. Finally, the inflection points of Germany, Italy, France and Spain are estimated in 14 April 2020, 24 April 2020, 27 April 2020 and 22 April 2020, respectively. It should be noted that the dynamic transmission rate of France hovered in Annex 1 from 15 April to 27 April, which led to the delay of the estimated inflection point.
Figure 4g,h show that in the epidemic curves for South Korea and Iran, the declines are larger in the early days, meaning that the slopes of the curves are comparatively large. This is mainly due to the two countries implementing strong prevention and control measures in the early stages, and the epidemic situation is quickly brought under control. The inflection points for South Korea and Iran are estimated to be 14 March 2020 and 5 April 2020, respectively, earlier than those for European and North American countries. However, the outbreak in Iran rebounds again in May, and this is the main reason for the relatively large error in the forecast.

3.6. Sensitivity Analysis

Due to the deviation of epidemic statistics and the lag of data updating in various countries, whether the prediction model can effectively resist noise, that is, the impact of small changes in the original data on the output of the model, is also an important evaluation index in this paper.
The data is processed as follows: the daily number of existing cases is randomly added or reduced by 0–1% of its own value. Then the output of the model is compared with the output of the original model, and the formula of relative change rate is as follows:
Q ˜ = 1 T t = 1 T N ˜ ( t ) N ^ ( t ) N ^ ( t ) × 100 % ,
where N ˜ ( t ) and N ^ ( t ) represent the output of the noise model and the output of the original model, respectively. Each country has done 10 experiments, and the experimental results are shown in the form of average ± standard deviation.
As shown in Table 8, in terms of dynamic transmission rate, the average rate of change in all countries is less than 0.3%. Except Germany and France, all average variabilities of the existing number of cases in other countries are not greater than 3%. For Germany, the fluctuation of the early dynamic transmission rate is obvious, and the later dynamic transmission rate fluctuates periodically (as shown in Figure 4), which are the reasons for the larger average rate of change. It is noteworthy that the standard deviation of the change rate in the existing number of cases in Italy is greater than that in Germany, but the average ± standard deviation is better than that in Germany. The average change rate of the dynamic transmission rate in all experiments was smaller than that of the existing infected population. This is because that the base number of the existing cases was large, and the small change of the dynamic transmission rate would cause large changes in the existing infected population. On the whole, our model has good anti-noise ability and stability when dealing with data fluctuations.

3.7. Global Epidemic Forecast

Accurate prediction of the global epidemic is the key to effectively grasping the overall trend of the COVID-19 pandemic. Based on the high performance of INCDTRM, we analyze and forecast the global epidemic. Since the global epidemic database is large and the data period is long, the late dynamic transmission rate is almost linear, and we do not have the ability to predict it. Therefore, the data are preprocessed as follows. First, we select the data from 1 April to 30 June 2021. Then, the existing case sequence is calculated, and the sequence is subtracted from the number of the existing cases on 31 March 2021. Finally, we take the sequence as experimental dataset. Figure 5 shows the development trend of the global epidemic in the future using INCDTRM.
As shown in Figure 5a, the spread of the global epidemics is falling, but with the increase of time, the declining trend of the global dynamic transmission rate begins to slow down. The inflection point is predicted as 10 May 2021, in which the actual inflection point was 9 May 2021, indicating that the model in this paper has high reliability. In addition, the number of existing infections is predicted in the global epidemic from 1 June to 30 June 2021, a total of 30 days. In Figure 5b, the number of existing infections in the global epidemic is decreasing, and the estimated number of existing cases is consistent with the actual trend of the number of existing cases. The estimated errors of MAE, RMSE and MAPE are 141,374.15, 182,765.24 and 0.522%, respectively.
However, the number of existing infections in the global epidemic fluctuated in June 2021. Therefore, we should continue to appeal to wear masks and stop large-scale gatherings to avoid the rebound of the epidemic trend. At the same time, the development of a vaccine for COVID-19 is also an effective way to effectively suppress the spread of infectious diseases [31]. In addition, we can update the prediction model in real time according to the daily new data, and thus we can grasp the spread trend of global epidemics in a timely manner.

4. Conclusions

In this paper, we have presented an INCDTRM based on FEM and SVR for analyzing and predicting the COVID-19 pandemic in eight countries. The experimental results show that INCDTRM has smaller prediction error and stronger generalization ability than the single prediction models, DGRM and NCDTRM that have been used previously; forecast errors for epidemics in USA, Canada, Italy and South Korea were within 5%. This shows the rationality of using dynamic transmission rate to replace the basic infection number R 0 , and our model can thus be utilized for predicting the COVID-19 epidemic.
Furthermore, we also used INCDTRM to model the global COVID-19 pandemic. The experimental results predict that the inflection point of the global epidemic is May 10, 2021. and the estimated errors of MAE, RMSE and MAPE are 141,374.15, 182,765.24 and 0.522%, respectively. As Ferguson et al. [32] pointed out that we are now at a critical moment of the epidemic, and any slackening in prevention and control will lead to a rebound in the spread of the epidemic.
It should be noted that this paper has the following defects: (1) Due to the monotonicity of the fitting functions, INCDTRM is not suitable for predicting a rebound in an epidemic; (2) The weights of the combined kernel function and the initial parameters of PSO need to be set according to experience; (3)The spread of COVID-19 epidemic is affected by social factors [33], population [34], climate, environment [35] and other factors, which is very complex, but this paper only considers a single epidemic data, without considering other factors that may affect the spread of the epidemic.
According to the above limitations, the future work of this paper is as follows: (1) Finding a suitable COVID-19 multi-stage infectious disease development model; (2) Using more effective multi kernels learning models and adaptive optimization algorithms; (3) We will further collect relevant data and conduct research with data assimilation or SEIR and its extended model.

Author Contributions

Conceptualization, X.X.; Fund acquisition, Z.Y. and G.W.; Methodology, X.X. and K.L.; Supervision, Z.Y. and G.W.; Writing—original draft preparation, X.X.; Writing—review & editing, X.X., K.L. and G.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Nos. 11971302, 62072296) and National Statistical Science Research Project of China (No. 2020LY067).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available at the following link: https://github.com/CSSEGISandData/COVID-19 (accessed on: 17 September 2021).

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. WHO Director-General, World Health Organization. 11 February 2020. Available online: https://www.who.int/director-general/speeches/detail/who-director-general-s-remarks-at-the-media-briefing-on-2019-ncov-on-11-february-2020 (accessed on 17 September 2021).
  2. Yang, Z.; Zeng, Z.; Wang, K.; Wong, S.S.; Liang, W.; Zanin, M.; Liu, P.; Cao, X.; Gao, Z.; Mai, Z.; et al. Modified SEIR and ai prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 2020, 12, 165–174. [Google Scholar] [CrossRef] [PubMed]
  3. Wu, J.; Leung, K.; Leung, G. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study. Lancet 2020, 395, 689–697. [Google Scholar] [CrossRef] [Green Version]
  4. Lopez, L.; Rodo, X. A modified SEIR model to predict the COVID-19 outbreak in Spain and Italy: Simulating control scenarios and multi-scale epidemics. Results Phys. 2021, 21, 103746. [Google Scholar] [CrossRef] [PubMed]
  5. Mahajan, A.; Sivadas, N.; Solanki, R. An epidemic model SIPHERD and its application for prediction of the spread of COVID-19 infection in India. Chaos Solitons Fractals 2020, 140, 110156. [Google Scholar] [CrossRef]
  6. Sanyi, T.; Biao, T.; Bragazzi, N.L.; Fan, X.; Tangjuan, L.; Sha, H.; Pengyu, R.; Xia, W.; Xiang, C.; Zhihang, P.; et al. Analysis of COVID-19 epidemic traced data and stochastic discrete transmission dynamic model. Sci. Sin. Math. 2020, 50, 1071. (In Chinese) [Google Scholar]
  7. Ankarali, H.; Ankarali, S.; Caskurlu, H.; Cag, Y.; Arslan, F.; Erdem, H.; Vahaboglu, H. A statistical modeling of the course of COVID-19 (SARS-CoV-2) outbreak: A comparative analysis. Asia-Pac. J. Public Health 2020, 32. [Google Scholar] [CrossRef]
  8. Liu, Z. Uncertain growth model for the cumulative number of COVID-19 infections in China. Fuzzy Optim. Decis. Mak. 2020, 20, 229–242. [Google Scholar] [CrossRef]
  9. Smirnova, A.; DeCamp, L.; Chowell, G. Mathematical and Statistical Analysis of Doubling Times to Investigate the Early Spread of Epidemics: Application to the COVID-19 Pandemic. Mathematics 2021, 9, 625. [Google Scholar] [CrossRef]
  10. Parbat, D.; Chakraborty, M. A python based support vector regression model for prediction of COVID19 cases in India. Chaos Solitons Fractals 2020, 138, 109942. [Google Scholar] [CrossRef]
  11. Tuli, S.; Tuli, S.; Tuli, R.; Buyya, R. Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Internet Things 2020, 11, 100222. [Google Scholar] [CrossRef]
  12. Zeroual, A.; Harrou, F.; Dairi, A.; Sun, Y. Deep learning methods for forecasting COVID-19 time-series data: A comparative study. Chaos Solitons Fractals 2020, 140, 110121. [Google Scholar] [CrossRef] [PubMed]
  13. Devaraj, J.; Elavarasan, R.M.; Pugazhendhi, R.; Shafiullah, G.M.; Ganesan, S.; Jeysree, A.K.; Khan, I.A.; Hossain, E. Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant? Results Phys. 2021, 21, 103817. [Google Scholar] [CrossRef]
  14. Bezerra, A.; Santos, E. Prediction the daily number of confirmed cases of COVID-19 in Sudan with ARIMA and holt winter exponential smoothing. Int. J. Dev. Res. 2020, 10, 39408–39413. [Google Scholar]
  15. González, P.; Núñez, C.; Sánchez, J.; Valverde, G.; Velasco, J. Expert System to Model and Forecast Time Series of Epidemiological Counts with Applications to COVID-19. Mathematics 2021, 9, 1485. [Google Scholar] [CrossRef]
  16. Layne, S.; Hyman, J.; Morens, D.; Taubenberger, J. New coronavirus outbreak: Framing questions for pandemic prevention. Sci. Transl. Med. 2020, 12, eabb1469. [Google Scholar] [CrossRef] [Green Version]
  17. Huang, E.; Qiao, F. A data driven time-dependent transmission rate for tracking an epidemic: A case study of 2019-nCoV. Sci. Bull. 2020, 65, 425–427. [Google Scholar] [CrossRef]
  18. Hu, Y.; Liu, Y.; Wu, L.; Wang, J.; Kong, J.; Zhang, Y.; Dai, Y.; Yang, Z. A dynamic transmission rate model and its application in epidemic analysis. Oper. Res. Trans. 2020, 24, 27–42. (In Chinese) [Google Scholar]
  19. Hu, Y.; Liu, Y.; Wu, L.; Wang, J.; Kong, J.; Zhang, Y.; Dai, Y.; Yang, Z. A dynamic growth rate model and its application in global COVID-19 epidemic analysis. Acta Math. Appl. Sin. 2020, 43, 452–467. (In Chinese) [Google Scholar]
  20. Xie, X.; Luo, K.; Zhang, Y.; Jin, J.; Lin, H.; Wang, G. Nonlinear combinational dynamic transmission rate model and COVID-19 epidemic analysis and prediction in China. Oper. Res. Trans. 2021, 25, 17–30. (In Chinese) [Google Scholar]
  21. Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the support vector regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
  22. Wei, C. Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model. Energy Convers. Manag. 2009, 50, 105–117. [Google Scholar]
  23. Yang, W.; Deng, M.; Xu, F.; Wang, H. Prediction of hourly PM2.5 using a space-time support vector regression model. Atmos. Environ. 2018, 181, 12–19. [Google Scholar] [CrossRef]
  24. Moser, G.; Serpico, S. Automatic parameter optimization for support vector regression for land and sea surface temperature estimation from remote sensing data. IEEE Trans. Geosci. Remote. Sens. 2009, 47, 909–921. [Google Scholar] [CrossRef]
  25. Smola, A.; Scholkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  26. Vapnik, V.N. Statistical Learning Theory; John Wiley & Sons: New York, NY, USA, 1998. [Google Scholar]
  27. Peng, Y.; Nagata, M. An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data. Chaos, Solitons Fractals 2020, 139, 110055. [Google Scholar] [CrossRef] [PubMed]
  28. Ribeiro, M.; Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
  29. Chen, Y.; Hou, D. Combination forecasting model based on forecasting effective measure with standard deviate. J. Syst. Eng. 2003, 18, 203–210, 223. (In Chinese) [Google Scholar]
  30. Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020, 20, 533–534. [Google Scholar] [CrossRef]
  31. Lv, W.; Ke, Q.; Li, K. Dynamical analysis and control strategies of an SIVS epidemic model with imperfect vaccination on scale-free networks. Nonlinear Dyn. 2020, 99, 1507–1523. [Google Scholar] [CrossRef] [Green Version]
  32. Wang, H.; Wang, Y.; Walker, P.G.; Walters, C.; Winskill, P.; Whittaker, C.; Donnelly, C.A.; Riley, S.; Ghani, A.C. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Br. Med J. 2020. [Google Scholar]
  33. Lin, R.; Lin, S.; Yan, N.; Huang, J. Do prevention and control measures work? Evidence from the outbreak of COVID-19 in China. Cities 2021, 118, 103347. [Google Scholar] [CrossRef] [PubMed]
  34. Bhadra, A.; Mukherjee, A.; Sarkar, K. Impact of population density on Covid-19 infected and mortality rate in India. Model. Earth Syst. Environ. 2021, 7, 623–629. [Google Scholar] [CrossRef]
  35. Xu, X.; Zhang, X.; Mendes, J. Impacts of preference and geography on epidemic spreading. Phys. Rev. E 2007, 76, 056109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Forecasting framework of INCDTRM.
Figure 1. Forecasting framework of INCDTRM.
Mathematics 09 02307 g001
Figure 2. Dynamic transmission rate ( k = 7 ).
Figure 2. Dynamic transmission rate ( k = 7 ).
Mathematics 09 02307 g002
Figure 3. Average MAPE of eight countries under different weights λ .
Figure 3. Average MAPE of eight countries under different weights λ .
Mathematics 09 02307 g003
Figure 4. Fitting diagrams of dynamic transmission rate.
Figure 4. Fitting diagrams of dynamic transmission rate.
Mathematics 09 02307 g004
Figure 5. Global epidemic forecast.
Figure 5. Global epidemic forecast.
Mathematics 09 02307 g005
Table 1. Four fitting functions.
Table 1. Four fitting functions.
Fitting FunctionFunction ExpressionParameterReference
Two-parameter power function f 1 ( t ) = β 1 t β 2 1 β 1 > 0 , β 2 > 0 [18]
Three-parameter logarithmic function f 2 ( t ) = β 1 ln ( β 2 t ) + β 3 β 1 < 0 , β 2 > 0 , β 3 R new
Three-parameter hyperbolic function f 3 ( t ) = β 1 + t β 2 + β 3 t β 1 R , β 2 + t β 3 0 [20]
Three-parameter logistic function f 4 ( t ) = β 1 ( 1 β 2 exp ( β 3 t ) ) + β 4 β 1 > 0 , β 2 < 0 , β 3 < 0 , β 4 R [8]
Table 2. First and last report dates by country.
Table 2. First and last report dates by country.
ContinentCountryNumber of Observed DaysFirst ReportLast ReportCumulative Confirmed CasesCumulative Deaths
North AmericaUSA11128/01/202017/05/20201,507,77390,113
North AmericaCanada11128/01/202017/05/202077,2575801
EuropeGermany11128/01/202017/05/2020176,3697958
EuropeItaly10931/01/202017/05/2020224,76031,763
EuropeFrance8721/02/202017/05/2020179,63027,532
EuropeSpain10701/02/202017/05/2020276,50527,563
AsiaSouth Korea11131/01/202017/05/202011,050262
AsiaIran8919/02/202017/05/2020120,1986988
Table 3. The best sliding window period.
Table 3. The best sliding window period.
CountryBest k
USA7
Canada4
Germany2
Italy7
France4
Spain2
South Korea1
Iran1
Table 4. Parameters search range.
Table 4. Parameters search range.
ParametersMinMax
C0.11
γ 0.11
Table 5. Parameter values.
Table 5. Parameter values.
CountryC γ ε
USA0.23700.99970.0001
Canada0.10000.38170.0001
Germany0.30000.93870.0001
Italy0.20500.53010.0001
France1.00000.17280.0001
Spain0.15551.00000.0001
South Korea0.69880.50030.0001
Iran1.00000.19500.0001
Table 6. Forecast accuracy of different models.
Table 6. Forecast accuracy of different models.
CountryCriteriaModel
f 1 ( t ) f 2 ( t ) f 3 ( t ) f 4 ( t ) DGRM [19] NCDTRM [20] INCDTRM
MAE181,553.51203,871.4586,416.1739,651.20181,553.5132,902.7212,068.99
USARMSE213,320.96239,621.02100,376.0157,410.1421,3320.9648,042.4116,261.69
MAPE17.91%20.11%8.54%3.83%17.91%3.18%1.20%
MAE2794.547550.635460.932077.572452.261160.591007.98
CanadaRMSE3345.028740.136285.812267.682973.741320.661300.06
MAPE8.63%23.37%16.91%7.39%7.56%3.59%3.11%
MAE4067.542377.991784.451784.764996.272717.741595.88
GermanyRMSE4871.163044.102028.512028.815505.853374.361782.99
MAPE22.33%13.21%7.70%7.70%26.24%15.10%7.81%
MAE13,804.0818,131.276579.786580.1413,804.083763.303509.84
ItalyRMSE14,792.0319,482.718047.198047.5914,792.034104.033856.71
MAPE16.13%21.24%8.03%8.03%16.13%4.27%3.97%
MAE32,738.5536,091.8315,380.0015,378.9328,154.599239.8115,694.11
FranceRMSE37,108.4840,924.4218,531.1118,529.9632,288.0211,827.2519,165.06
MAPE35.16%38.77%16.53%16.53%30.25%9.93%16.87%
MAE16,646.8722,236.1016,259.8816,259.618,681.2010,815.325910.42
SpainRMSE20,180.0926,274.6219,429.4319,429.119,974.9111,061.346619.12
MAPE26.48%35.21%25.80%25.80%12.95%15.85%8.66%
MAE237.76207.44104.7254.87194.7048.0746.15
South KoreaRMSE277.01247.12133.1274.67225.6561.4557.95
MAPE22.82%20.05%10.20%5.34%18.59%4.62%4.42%
MAE7622.716945.725692.074353.947667.845104.995534.86
IranRMSE9134.488412.687103.175715.059040.226613.547046.04
MAPE48.41%43.96%35.72%26.95%48.85%31.72%34.55%
AVG_MAPE24.73%26.99%16.18%12.70%22.31%11.03%10.07%
Table 7. The inflection point.
Table 7. The inflection point.
CountryEstimated Inflection PointActual Inflection Point
USA04/08/202031/05/2020
Canada26/05/202031/05/2020
Germany14/04/202008/04/2020
Italy24/04/202020/04/2020
France27/04/202016/04/2020
Spain22/04/202025/04/2020
South Korea15/03/202012/03/2020
Iran06/04/202005/04/2020
Table 8. First and last report dates by country.
Table 8. First and last report dates by country.
CountryDynamic Transmission RateThe Existing Number of Cases
USA0.16% ± 0.09%1.65% ± 1.05%
Canada0.14% ± 0.06%1.31% ± 0.50%
Germany0.30% ± 0.22%3.46% ± 2.74%
Italy0.24% ± 0.28%2.76% ± 3.19%
France0.25% ± 0.09%3.03% ± 1.18%
Spain0.23% ± 0.15%2.18% ± 1.44%
South Korea0.25% ± 0.16%2.56% ± 1.64%
Iran0.22% ± 0.17%2.14% ± 1.69%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xie, X.; Luo, K.; Yin, Z.; Wang, G. Nonlinear Combinational Dynamic Transmission Rate Model and Its Application in Global COVID-19 Epidemic Prediction and Analysis. Mathematics 2021, 9, 2307. https://doi.org/10.3390/math9182307

AMA Style

Xie X, Luo K, Yin Z, Wang G. Nonlinear Combinational Dynamic Transmission Rate Model and Its Application in Global COVID-19 Epidemic Prediction and Analysis. Mathematics. 2021; 9(18):2307. https://doi.org/10.3390/math9182307

Chicago/Turabian Style

Xie, Xiaojin, Kangyang Luo, Zhixiang Yin, and Guoqiang Wang. 2021. "Nonlinear Combinational Dynamic Transmission Rate Model and Its Application in Global COVID-19 Epidemic Prediction and Analysis" Mathematics 9, no. 18: 2307. https://doi.org/10.3390/math9182307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop