Next Article in Journal
Composite Multiscale Cross-Sample Entropy Analysis for Long-Term Structural Health Monitoring of Residential Buildings
Previous Article in Journal
Cryptographic Algorithm Using Newton-Raphson Method and General Bischi-Naimzadah Duopoly System
Previous Article in Special Issue
An Ultrametric Random Walk Model for Disease Spread Taking into Account Social Clustering of the Population
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analytical Parameter Estimation of the SIR Epidemic Model. Applications to the COVID-19 Pandemic

1
Environment, Health and Safety, Leuven, IMEC, Kapeldreef 75, 3001 Leuven, Belgium
2
MMSDP, IICT, Bulgarian Academy of Sciences, Acad. G. Bonchev St., Block 25A, 1113 Sofia, Bulgaria
Entropy 2021, 23(1), 59; https://doi.org/10.3390/e23010059
Submission received: 16 November 2020 / Revised: 22 December 2020 / Accepted: 28 December 2020 / Published: 31 December 2020
(This article belongs to the Special Issue Entropy and Epidemiology)

Abstract

:
The SIR (Susceptible-Infected-Removed) model is a simple mathematical model of epidemic outbreaks, yet for decades it evaded the efforts of the mathematical community to derive an explicit solution. The present paper reports novel analytical results and numerical algorithms suitable for parametric estimation of the SIR model. Notably, a series solution of the incidence variable of the model is derived. It is proven that the explicit solution of the model requires the introduction of a new transcendental special function, describing the incidence, which is a solution of a non-elementary integral equation. The paper introduces iterative algorithms approximating the incidence variable, which allows for estimation of the model parameters from the numbers of observed cases. The approach is applied to the case study of the ongoing coronavirus disease 2019 (COVID-19) pandemic in five European countries: Belgium, Bulgaria, Germany, Italy and the Netherlands. Incidence and case fatality data obtained from the European Centre for Disease Prevention and Control (ECDC) are analysed and the model parameters are estimated and compared for the period Jan-Dec 2020.

1. Introduction

The dramatic outbreak of the coronavirus disease 2019 (COVID-19) pandemics and its ongoing progression boosted the scientific community’s interest in epidemic modelling and forecasting. The coronavirus 2019 (COVID-19) disease was reported to appear for the first time in Wuhan, China, and later it spread to Europe, which is the subject of the presented case studies, and eventually worldwide. While there are individual clinical reports for COVID-19 re-infections, the present stage of the pandemic still allows for the application of a relatively simple epidemic model, which is the subject of the present report. The motivation behind the presented research was the intention to accurately model the short-term dynamics of the outbreaks of COVID-19 pandemics, which by the time of the first submission of the present article, has infected more than 27 million individuals worldwide. The efforts to contain the spread of the pandemic induce sustained social and economic damage. Therefore, the ability to accurately forecast short to medium-term epidemic outbreak’s dynamics is of substantial public interest.
The present paper focuses on the utility of the SIR (Susceptible-Infected-Removed) epidemiological model in epidemic outbreak modelling. The SIR model was introduced in 1927 by Kermack and McKendrick in 1927 to study the plague and cholera epidemics in London and Bombay [1]. The original paper also introduced some asymptotes of the model variables. To date the SIR model remains as a cornerstone of mathematical epidemiology. It is a deterministic model formulated in terms of ordinary differential equations (ODEs). The model has been extensively used to study the spread of various infectious diseases (see the monograph of Martcheva [2]). Outside epidemiology, the SIR model is also extensively used in modelling of online social networks, viral marketing, diffusion of ideas, spread of computer viruses, financial network contagion, etc. (see recent survey by Rodrigues and the references therein [3]).
The SIR model can be modified in several ways, for example by including demographics, deceased population (i.e., SIRD) or hidden populations (i.e., “exposed” individuals–SEIR) [2]. As the conditions of an accelerating epidemic outbreak often make contact tracing problematic, the SEIR model needs to make additional assumptions. Therefore, outcome variables, such as mortality, can be foretasted more easily using a simpler model, e.g., using the SIR model. Recently, several authors demonstrated that the first wave of COVID-19 followed SIR dynamic, which by itself, is an interesting finding [4,5,6,7,8,9].
The present paper has two main objectives—(i) to report some new analytical results about SIR model and (ii) to introduce an algorithm for the estimation of the parameters of the SIR model from empirical time series data. A challenge with the forecasting, based on empirical estimation of the parameters, is the inherent uncertainty built in such estimates. This combined with the multiplicative interactions in the SIR model variables can lead to huge discrepancies between the observed and forecasted values. While the SIR model has been modified to include also other sub-populations the empirical estimation of the model parameters poses even greater challenge. In contrast to previous approaches, the paper does not approach the SIR model as an initial value problem but as a problem in the theory of special functions. This change of perspective allows for relative ease of handling of noisy data, e.g., time series having fluctuations caused by delays and accumulation of case reporting. The numerical approach was applied to the COVID-19 incidence and case fatality data from different European countries, having different population densities and dynamics of the epidemic outbreaks. The methods of the present paper go beyond the presented COVID-19 application case and can be used in a variety of fields as indicated above.

2. The SIR Model

The SIR model is formulated in terms of three populations of individuals. The S population consists of all individuals susceptible to the infection of concern. The I population comprises the infected individuals. These persons have the disease and can transmit it to the susceptible individuals. The R population represents the immune individuals, who cannot become infected and cannot transmit the disease to others. The model comprises a system of three ODEs:
S ˙ ( t ) = β N S ( t ) I ( t )
I ˙ ( t ) = β N S ( t ) I ( t ) γ I ( t )
R ˙ ( t ) = γ I ( t )
The model assumes a constant overall population N = S + I + R . A disease carrier infects on average β individuals per day, for an average time of 1 / γ days. The β parameter is called disease transmission rate, while γ recovery rate. The average number of infections arising from an infected individual is then modelled by the number R 0 = β γ , the basic reproduction number. Typical initial conditions are S ( 0 ) = S 0 ,   I ( 0 ) = I 0 ,   R ( 0 ) = 0 [1].
The model can be re-parametrized using normalized variables as
s ˙ = s i
i ˙ = s i g i ,   g = γ β = 1 R 0
r ˙ = g i ,
subject to normalization s + i + r = 1 and time rescaling τ = β t . This is a model in dimensionless time τ . It this way, g also becomes a dimensionless parameter. Since i ( τ ) is integrable on [ 0 , ) then i ( ) = 0 .

3. The Implicit Analytical Solution

The analytical solution will be formulated first in an implicit form. Since there is a first integral by construction the system can be reduced to two equations:
d i d s = 1 + g s
d i d r = s g 1
Remark 1.
From this formulation
R e = N s 0 g = S 0 β γ 1
must hold for the infection to propagate. R e is called the effective reproductive number, while the basic reproduction number is R 0 = R e N [10].
In order to solve the model we will consider the two equations separately. Direct integration of Equation (7) gives
i = s + g log s + c
where the constant c can be determined from the initial conditions. In the present treatment, the constant c will be left indeterminate to be assigned by the different re-parametrization procedures. The s variable can be expressed explicitly in terms of the Lambert W function [11]:
s = g W ± e i c g g
where the signs denote the two different real-valued branches of the function. Note, that both branches are of interest since the argument of the Lambert W function is negative. Therefore, Equation (5) can be reduced to the first-order autonomous system
i ˙ = i g W ± e i c g g + 1
valid for two disjoined domains on the real line. The equation can be solved for the time τ by quadrature as
d i i W ± e i c g g + 1 = g τ
Remark 2.
There is another equivalent form of the system using the Wright Ω function [12] since
W e i c g g = Ω i c g log ( g )
so that
i ˙ = i g Ω i c g log g g ± j π + 1
where j represents the imaginary unit.
The s variable can be determined by substitution in Equation (4), resulting in the autonomous system
s ˙ = s s + g log s + c
which can be solved implicitly as
d s s s g log s c = τ
Remark 3.
This implicit solution for the s-variable has been derived originally in [13]. Note that in the present case it was not necessary to derive and solve a Bernoulli differential equation [13,14]. Therefore, this is a new method of proof. The authors did not establish the non-elementary character of the integral. The results for the i-variable are original.
Finally, the r variable can also be conveniently expressed in terms of i. For this purpose we solve the differential equation
d r d i = g s g = 1 1 + W ± e i c g g
Therefore,
r = c 1 g log g W ± e i c g g
by Proposition A4. On the other hand,
g log g W e i c g g = g   log e i c g g W e i c g g = g W e i c g g + i g log g c = s + i g log g c
So that
r = g W e i c g g i + c 1
For the purposes of curve fitting we assume that i ( ) = r ( ) = 0 . Therefore,
c 1 = g W e c g g

3.1. Peak Value Parametrization

The peak-value parametrization is necessary for the treatment of the problem as a special function. The initial value parametrization is sensitive to the noise in the initial value conditions, and is therefore unsuitable for curve-fitting. The upper terminal of integration can be determined by the requirement for the real-valuedness of i. This value of i is denoted as i m ; that is
W ± e i m c g g = 1
Therefore,
i m = c g log g g
The peak-value parametrization is supported by the fact that i ( t ) attains a global maximum i = i m = c g log g g Proposition (A1).
If we consider formally the phase space z × y = g z W ± e z i m g 1 + 1 the following argument allows for the correct branch identification. For i it follows that W e z i m g 1 so y < 0 ; while W + e z i m g 1 0 + so y > 0 . Therefore, if we move the origin as t ( 0 ) = i m then conveniently
i m i d z z W + e z i m g 1 + 1 = g τ ,   τ > 0
i m i d z z W e z i m g 1 + 1 = g τ ,   τ 0
Furthermore, the recovered population under this parametrization is
r = g W ± e i i m g 1 g W e i m g 1 i
under the same choice of origin.

3.2. Initial Value Parametrization

As customarily accepted, the SIR model can be solved as an initial value problem. The indeterminate constant c can be eliminated using the initial condition
i 0 = s 0 + g log s 0 + c
Therefore,
i = i 0 + s 0 s + g log s / s 0 = 1 s + g log s 1 i 0
For this case, the following autonomous differential equation can be formulated:
i ˙ = g i W ± 1 i 0 g e i 1 g + 1
This can be solved implicitly by separation of variables as
i 0 i d z z W 1 i 0 g e z 1 g + 1 = g τ ,   τ t m
i 0 i d z z W + 1 i 0 g e z 1 g + 1 = g τ ,   τ > t m
It is noteworthy that the time to the peak of infections t m can be calculated as
t m = 0 log g / s 0 d u s 0 e u g u ( s 0 + i 0 )
The result follows by considering the autonomous system Equation (7) and fixing the upper terminal of integration s = g . However, by Proposition A3 this definite integral can be evaluated only numerically.

4. Is the Incidence Function “New”?

The incidence i-function of the SIR model appears to be an interesting object of study on its own. One may pose the question about the representation of this function by other, possibly, elementary functions. The answer to this question is in the negative as will be demonstrated below. In precise terms, this function is non-Liouvillian. However, this does not mean that the function can not be well approximated. Fortunately, this is the case as the i-function can be approximated for a sufficiently wide domain of parameter values by Newtonian iteration.
Definition 1.
An elementary function is defined as a function built from a finite number of combinations and compositions of algebraic, exponential and logarithm functions under algebraic operations (+,−,., /)
Allowing for the underlying field to be complex numbers— C , trigonometric functions become elementary as well.
Definition 2
(Liouvillian function). We say that f ( x ) is a Liouvillian function if it lies in some Liouvillian extension of C ( x ) , for some constant field C.
As a first point we establish the non-elementary character of the integral in Equation (11). The necessary introduction to the theory of differential fields is given in the Appendix C. From the work of Liouville it is known that if a function F ( x ) = f ( x ) e q ( x ) , where f, q are elementary functions, has an elementary anti-derivative of the form [15]
F ( x ) d x = f e q d x = h e q
for some elementary function h ( x ) [16]. Therefore, differentiating we obtain
f e q = h e q + h q e q
so that if e q 0
h + h q = f
holds. The claim can be strengthened to demand that h be algebraic for algebraic f and q.
Theorem 1.
The integrals
I ± ( ξ ) = d ξ ξ W ± e ξ c g g + 1
are not Liouvillian.
Proof. 
We use i m parametrization. Let c = i m + g g log g . The proof proceeds by change of variables – first ξ = g log y y g + g + i m ; followed by z = log ( ( g log y + i m + g ) / g ) .
I = d ξ ξ W ± e ξ i m g 1 + 1 = y 1 y g log ( y ) g y + i m + g W y e y + 1 d y = d y y ( g log ( y ) g y + i m + g ) = 1 g e z + i m g + 1 e e z e z + i m g + 1 d z
since W ± y e y = y . The last integral has the form
A e z e e z A e z d z
which allows for the application of the Liouville theorem in the form of Corollary A1. We can identify
f e z d x = h e z ,   f ( z ) = A e e z A e z ,   A = e i m g + 1
so that
A e e z A e z = h ( z ) + h ( z )
for some unknown algebraic h ( z ) . Since the left-hand side of the equation is transcendental in z so is the right-hand side. Therefore, the integrand has no elementary antiderivative. ☐
Theorem 2.
The incidence function i ( t ) , defined by the differential Equation (10), is not Liouvillian.
Proof. 
For the present case, let us assume that i ( 0 ) = i m so that i attains the maximum by Proposition A1. Therefore,
i = g i W e i i m g 1 + 1
Without loss of generality let g = 1 , which amounts to scaling of the solution by the factor of 1 / g .
Suppose further that
i i m 1 = log u u
for some algebraic function u (log-extension case). Then
i i = W e log u u + 1 = u + 1
On the other hand,
i i = log log u u + i m + 1 = 1 u u log u u + i m + 1 u
so that
1 u u log u u + i m + 1 = 1 u u
However, if u is algebraic so is u by Theorem A1. Therefore, we have a contradiction, since the left-hand-side is transcendental. Hence, i is not part of a logarithmic elementary extension.
Suppose that u is exponential, i.e., u = e f for some algebraic function f. In this case,
e f 1   f e f + f + i m + 1 = 1 e f f = e f + f + i m + 1
However,
f + f + i m + 1 = e f
Therefore, the right-hand-side is exponential and can not be algebraic as it is demanded by Corollary A1. This is a contradiction, hence f can not be algebraic, hence u is not part of an exponential elementary extension.
Finally, suppose that i ( t ) is algebraic. Since the Wright function Ω ( z ) is transcendental [12] it follows that W e i i m 1 = Ω i i m 1 + i π can not be algebraic in i. Therefore, i / i and hence i must be transcendental. Hence, the case of an algebraic i ( t ) can not hold either.
In summary all three cases are rejected, therefore, i is not Liouvillian. ☐

5. Explicit Series Solutions

We will derive two types of series for the i-function in view of the different re-parametrizations of the SIR model. The series can be derived starting from Equation (10) and applying the Taylor theorem.

5.1. Series for the i m Parametrization

The natural parametrization is fixing the peak at the origin. The Taylor development can be computed as follows:
i ( t ) = i m i m 2 g 2 t 2 + i m 3 g   6 t 3 + 4 i m 3 g 2 i m 4 g 24 t 4 15 i m 4 g 2 i m 5 g 120 t 5 +
and for the logarithm
log i ( t ) = log i m i m g 2 t 2 + i m 2 g 6 t 3 + i m 2 g 2 i m 3 g 24 t 4 5 i m 3 g 2 i m 4 g 120 t 5 +

5.2. Series for the i 0 -Parametrization

The Taylor series starting from an initial value i 0 is
i ( t ) = i 0 i 0 i 0 + g 1 t + g   i 0 4 i 0 2 + 5 g   i 0 7 i 0 3 g + 3 2 i 0 1 t 2 g   i 0 5 i 0 3 + 21 g   i 0 2 9 i 0 2 + 18 g 2   i 0 26 g   i 0 + 4 i 0 10 g 2 + 7 g 6 i 0 1 t 3 +
The Taylor series for the logarithm starting from an initial value i 0 is
log i ( t ) = log i 0 i 0 + g 1 t + g   i 0 i 0 + 2 g 1 2 i 0 1 t 2 g   i 0 i 0 + 2 g 1 2 i 0 + g 1 6 i 0 1 t 3 +
The series follow directly from successive differentiation of the differential Equation (19).

6. Numerical Approximation

The i-function can be efficiently approximated by the Newton’s method. The Newton iteration scheme is given as follows for the c-parametrization:
i n + 1 = i n i n   W ± e i n c g g + 1   g log g g + c i n d ξ ξ   W ± e ξ c g g + 1 + g t
This is a conceptually simple representation. However, it has the disadvantage of using the Lambert function inside the quadrature routine. Another equivalent representation is
i n + 1 = i n + g i n   W ± e i n c g g + 1   t g g W ± e i n c g g d y y   g log y y + c
(see Proposition A5). This form has the advantage of requiring only 1 Lambert function evaluation per iteration and using only elementary function in the quadrature routine.
A point of attention here is the choice of the initial value for the iteration scheme. Despite the present author’s best efforts, a rigorous analytical asymptotic valid on the entire real line and for all parameter values could not be found. Numerical experiments gave acceptable results using the formula
f ( x ) = b   e 1 x c e x c
g > 0 , c = 2 b g or c = 2 b g / e , and additionally c c / e ,   x > 0 for the initial value of i 0 = f ( t ) .
The analytical formula involving first exponentiation and then computation of the Lambert W function has disadvantages for large arguments due to float under or overflows [12]. Therefore, another special function can be used in principle, notably the Wright Ω function. On the other hand, optimized routines for its calculation are not readily available in MATLAB.

Plots

Plots of the branches of the integral I ( x ) (Figure 1) were obtained by direct numerical integration using the QUADPACK [17] routines in the Computer Algebra System Maxima. Plots of the SIR model (Figure 2) were obtained using a Java routine [18] implementing the double exponential integration method [19,20]. Both methods turn out to be suitable for the numerical integration problem.

7. Datasets

The COVID datasets were downloaded from the European Centre for Disease Prevention and Control (ECDC) website: https://opendata.ecdc.europa.eu/covid19/casedistribution/csv. The downloadable data file was updated daily until 14 December 2020 and contains the latest available public data on COVID-19 aggregated per country. The data collection policy is available from https://www.ecdc.europa.eu/en/covid-19/data-collection.

8. Data Analysis Pipeline

The SIR model was formulated to model epidemic outbreaks with short incubation period [1]. The model has only two independent variables and two parameters, which allow for their estimation from raw data. This allows for efficient curve-fitting approach as demonstrated further.
The SIR model can be modified into SIRD to account for the deceased population. This modification will not result in qualitative difference in the observed dynamics as it does not change the structure of the equations for the i and s-variables. Hence, it allows for an analysis using the SIR model.

8.1. Data Processing

The data were imported in the SQLite https://www.sqlite.org database, filtered by country and transferred to MATLAB for parametric fitting using native routines. Quadratures were estimated by the default MATLAB integration algorithms Estimated parameter values were stored in the same database.

8.2. Parametric Fitting

The parametric fitting was conducted using the fminsearchbnd routine which implements least-squares constrained optimization algorithm and is an alternative to the proprietary non-linear least squares routine of MATLAB [21]. To reduce the impact of the fluctuation in the weekly reporting of data the parametric fitting procedure was applied first to three day moving average of the time series. The fitting equation is given by
I t N · i t / 10.0 T | g , i m
where I t is the observed incidence or case fatality, respectively. For numerical stability reasons the time variable was rescaled by a factor of 10. The i m and T were estimated from the observed data as the apparent peak and the time to the apparent peak, respectively. The initial estimate of N was fixed to 1. An initial estimate of 0.75 was used (i.e., corresponding to R 0 = 1.33 ) for the g parameter.

9. Case Studies

The approach in the present paper is exemplified with data from ECDC for several European countries in the period January 2020–December 2020. The data analysis is limited to 14 December, since then ECDC changed reporting policy to weekly averages instead of daily averages. The dataset is available from https://zenodo.org/record/4383118 [22]. The paper focuses on five European countries – Bulgaria, Belgium, the Netherlands, Germany and Italy for the following reasons. Italy was selected as the country where the first outbreak erupted in Europe. Germany was selected as another example of a large European country with very intense internal and cross-border traffic. Belgium and the Netherlands were selected as densely populated European countries with intense internal and cross-border traffic. Moreover, the outbreaks in both countries could be traced to the North Italian outbreak. Finally, Bulgaria was selected as an example of an European country with relatively few cases during the first half of 2020.

9.1. Analysis of Case Fatality Data

The observed case fatality represented a parameter, which could be established with more confidence in the beginning of the pandemic due to the lack of testing and the non-specificity of the clinical signs of COVID-19. Hence, it was the primary target of the parametric fitting. The data fitting procedures are illustrated with the case fatality data of Belgium and are presented in Figure 3. The intermediate parameters are initially estimated on the 3-day moving average data. The final fit was performed on the raw data, using the intermediate parameters as initial values. The cumulative mortality data were estimated from Equation (17). As can be appreciated from Figure 3 fluctuations in reporting did not have a detrimental effect on the estimation procedure. The results for Belgium, the Netherlands, Germany and Italy are presented in Table 1 and Figure 4.

9.2. Analysis of Incidence Data

The raw data demonstrated weekly fluctuations most probably caused by the reporting irregularity due to weekend breaks and fluctuations in the testing demand. It is especially pronounced for the morbidity data of Germany for the presented period. The incidence data were analysed in the same way using the fitted parameters for the case fatality as initial values. The fitting results demonstrate different lags of the peaks of incidence vs case fatality in the studied countries. For example, for Germany it was 2.2 weeks, while for the Netherlands it was 0.3 weeks.

9.3. Analysis of the Second Wave

Changes of containment policies are meant to result in changes in the epidemic outbreak dynamics. This can be followed by the SIR model as demonstrated in the Bulgarian incidence dataset for the period March–July 2020, where resuming of public sports events in the end of June correlates with the 3rd increase of incidence (Figure 5). The process is difficult to automate because of the fluctuations in the data. Nevertheless, it was possible to accurately track the first three outbreaks.
The same approach was applied in the analysis of the second pandemic wave generally appearing after week 20 in the reported dataset. The results are presented in Figure 6 and Table 2, Table 3 and Table 4. Bulgaria was added to Table 2, Table 3 and Table 4 since the data exhibited pronounced peaks in both case fatality and incidence. Regarding the case fatality, Belgium, the Netherlands, and Italy exhibited smaller peaks during the second wave. All countries (except Bulgaria, since there was no pronounced first wave, Figure 5) exhibited an increase of R 0 for the case fatality. The case fatality peaks exhibited synchrony for Belgium and the Netherlands, corresponding to their geographical proximity. This was also true for the incidence peaks.
Regarding the incidence, the peaks of the second wave were larger everywhere. Belgium and Italy exhibited a drop in R 0 . The R 0 for Germany and the Netherlands appear to be anomalous as they are at large odds with the respective parameters for the case fatality.

10. Discussion

The presented results can be discussed along three main directions.

10.1. Analytical Aspects

For decades the SIR model evaded the efforts of the community to derive explicit solution. A formal analytical solution of the SIR model has been found only recently and formulated in the traditional setting of an initial value problem [13]. Yet, another perspective can be also of merit – the model can be treated as a manifold problem, which can be parametrized by any point on the flow. This is precisely the approach implemented by the peak-value parametrization in the present paper. The present paper establishes novel analytical results about the SIR model. The series solutions presented here are original. Other original results are the non-elementarity proofs for most of the presented integrals. Another interesting point is the proof of the non-Liouvillian character of the incidence i-function. An alternative proof could be also given, in principle, based on the work of Prelle and Singer [23].

10.2. Numerical Aspects

Presented results demonstrate the robustness of the fitting procedure with regard to the fluctuations in the raw data. On the other hand, more efforts are necessary in establishing a robust asymptotic of the incidence i-function. This is a clear direction for future research. Presented method is an alternative to commonly used phenomenological models (i.e., exponential, Richards, logistic, delayed logistic for R 0 estimation [24]) and can in principle be cast in a maximum likelihood fitting procedure. Such work, however, goes beyond the scope of the present paper.

10.3. Epidemiological Implications of the Results

Recorded data necessarily suffer from diverse biases. For example, the marked difference in the availability of tests in the early stages of the pandemics in different countries. Another one is the unsteady reporting resulting in fluctuations of the numbers. This severely limits the usefulness of the model formulation as an initial value problem, which is the standard numerical staring point. This can result in a drastic overestimation of the infection peak (see for example the predictions for UK in [25]).
A key finding of the present report is that simple models can be very useful in studying the epidemic outbreaks. This can be eventually extended to predicting the effects of different containment measures or the lack thereof [8]. Presented analysis corroborates the finding of other authors about the utility of the SIR model in analysing the COVID-19 pandemics [4,6,7,8]. Presented results indicate that there is a universality in the time evolution of COVID-19 and the same epidemic model, notably SIR, can be applied to countries having large differences in populations sizes and densities. More interestingly, the model seems to fit well also the case fatality data, which can be interpreted in the sense that the vulnerable population forms a distinct subpopulation from all susceptible individuals (e.g., elderly people). The data also lend support to a simple modification of the SIR model: notably – the SIRD model with an independent population of dead (D) persons. This corresponds to the recent findings of other authors [5,9].

Funding

This research received no external funding

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Supporting Results

Proposition A1.
i ( t ) attains a global maximum i = i m = c g log g g .
Proof. 
We use a parametrization for which i ( 0 ) = i m . Then
i ˙ ( τ ) = g i W ± e i i m g 1 + 1
It follows that
i ˙ ( 0 ) = 0 ,   i ( 0 ) = i m
so i m is an extremum. In the most elementary way since W(z) should be real-valued then
e i i m g 1 1 / e i i m g 0
Hence, i i m . ☐
The proof of Theorem 1 establishes also the validity of two additional propositions:
Proposition A2.
The integral
I = d y y g log ( y ) g y + c
is not elementary.
Proposition A3.
The integral
I = d y y + c e y
is not elementary.
Proof. 
By change of variables y = e x
I = d y y log ( y ) y + c = d x e x x c

Appendix B. Special Functions

Appendix B.1. The Lambert W Function

The Lambert W function can be defined implicitly by the equation
W ( z ) e W ( z ) = z ,   z C  
We observe that by Lemma A1 W ( z ) is transcendental. Furthermore, the Lambert function obeys the differential equation for x 1
W ( x ) = e W ( x ) 1 + W ( x )
W is a multivalued function. It has many applications in pure and applied mathematics. Properties of the W function are given in [11]. Useful identities
e W ( z ) = W ( z ) z
e n W ( z ) = z W ( z ) n
log W ( z ) = log z W ( z )
W n z n W ( z ) n 1 = n W ( z ) ,   n > 0 , z > 0
The W function is non-elementary and in particular it is non-Liouvillian [26]. The latter fact is not widely known, as pointed out by the authors. Its indefinite integral is:
W ( x ) d x = x W ( x ) + x W ( x ) x
Proposition A4.
d y 1 + W e y c g g = g log g W e y c g g + c o n s t
Proof. 
We differentiate
g log W e y c g g = e y c g W e y c g g g W e y c g g 1 + W e y c g g = 1 1 + W g e i c g
Proposition A5.
g log g g + c i d ξ ξ   W ± e ξ c g g + 1 = g g W ± e i c g g d y y   g log y y + c
Proof. 
We use the change of variables ξ c = g log y y and then simplification by the defining identity of the Lambert W function.
g log g g + c i d ξ ξ   W e ξ c g g + 1 = A B g y 1 g log y y + c   W e g log y y + c g c g g + 1 d y = A B g y y   g y log y y 2 + c y g + g y log y y 2 + c y d y = g A B d y y   g log y y + c
where
g log A A + c = g log g g + c ,   g log B B + c = i
Therefore, A = g and B = g W e i c g g . ☐

Appendix B.2. The Wright Ω Function

The Wright Ω function is related to the Lambert W function [11].
Ω ( z ) = W K ( z ) e z ,   z C
where K ( z ) = I m ( z ) π / 2 π is the unwinding number of z. Moreover,
Ω ( z ) + log Ω ( z ) = z ,   z t ± i π , t 1 ,
for the principal branch of the logarithm. It is a transcendental function. It obeys the differential equation
Ω ( x ) = Ω ( x ) 1 + Ω ( x )
The Ω function is non-Liovillian [26]. Its indefinite integral is:
Ω ( x ) d x = Ω ( x ) 2 2 + Ω ( x ) + c o n s t

Appendix C. Differential Fields

Definition A1.
Denote by C   x , c i , θ i the complex-valued ring, generated by the finite set of rational functions { θ i } i n and constants { c i } i n .
Definition A2.
An element θ is called algebraic if P ( x , θ ) = 0 for some polynomial
P ( x , t ) = t m + a m 1 t m 1 + + a 0 ,
where a i can be also rational functions of x, or else it is called transcendental.
Lemma A1
(Composition lemma). Denote by a and t the algebraic or transcendental elementary functions, respectively. The following compositions hold
a a = a ,   t a = t ,   a t = t
Proof. 
The case a a when a ( x ) is a polynomial is trivial. Suppose that a and b are both algebraic:
P ( x , a ) = 0 ,   Q ( x , b ) = 0
Without loss of generality suppose that a i are polynomial. Formally, b = f ¯ k ( x ) for any branch k with f ¯ algebraic since it is a root of a polynomial, where the bar denotes the inverse function in order to avoid confusion with exponentiation. Therefore,
a b = b m + a m 1 b m 1 + + a 0 = f ¯ k m ( x ) + a m 1 f ¯ k m 1 ( x ) + + a 0
is algebraic since it is computed by a finite sequence of algebraic operations.
Suppose that t = e x p ( x ) . Then e x p ( a ) is not algebraic, hence it is transcendental.
P ( x , e x ) = e x m + a m 1 e x m x + + a 0
is exponential.
Suppose that t = l o g ( x ) . Then l o g ( a ) is not algebraic, hence it is transcendental.
P ( x , log ( x ) ) = log m x + a m 1 log m 1 x + + a 0
is not algebraic, hence it is transcendental. ☐
In what follows is assumed that the differential field is of characteristic zero and has an algebraically closed field of constants. An element y of a differential field is said to be an exponential of an element A if y = A y , an exponential of an integral of an element A if y = A y ; logarithm of an element A if y = A / A , and an integral of an element A if y = A .
The next definition is due to [26].
Definition A3.
Let ( k , d / d x ) be a differential field of characteristic 0. A differential extension ( K , d / d x ) of k is called Liouvillian over k if there are θ 1 ,   ,   θ n K , such that K = C ( x , θ 1 ,   ,   θ n ) and for all i, at least one of the following
1. 
θ i is algebraic over k ( θ 1 ,   ,   θ n 1 )
2. 
θ i k ( θ 1 ,   ,   θ n 1 )
3. 
θ i / θ i k ( θ 1 ,   ,   θ n 1 )
holds. The constant subfield C ( K ) of K is defined to be the set of c in K, such that c = 0 .
The next theorem is due to [16].
Theorem A1.
If K is an elementary field, then it is closed under differentiation.
An elementary integrability theorem due to Conrad [16].
Theorem A2
(Rational Liouville criterion). For f , g C   ( x ) with f and g non-constant the function f ( x ) e g ( x ) can be integrated in elementary terms if and only if there exists a rational function h C   ( x ) such that h + g h = f .
The last result can be extended to algebraic functions as follows.
Corollary A1
(Algebraic Liouville criterion). For f ( x ) , g ( x ) algebraic and non-constant, the function f ( x ) e g ( x ) can be integrated in elementary terms if and only if there exists an algebraic function h ( x ) , for which h + g h = f .
Proof. 
Suppose that f and g are arbitrary elementary algebraic functions. Denote the primitive of f as f ÷ F . The integral can be integrated by parts
I = f ( x ) e g ( x ) d x = e g ( x ) d F = F ( x ) e g ( x ) F ( x ) e g ( x ) d x
Therefore,
f ( x ) + F ( x ) g ( x ) e g ( x ) d x = F ( x ) e g ( x )
We observe that g ( x ) is elementary by Theorem A1. The L.H.S has the form f e g and since f ( x ) + F ( x ) g ( x ) is elementary we can identify
h F ,   f 1 f + F g = h + h g
so that h + h g e g = h e g and the claim follows. ☐

References

  1. Kermack, W.O.; McKendrick, A.G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character 1927, 115, 700–721. [Google Scholar] [CrossRef] [Green Version]
  2. Martcheva, M. An Introduction to Mathematical Epidemiology; Springer: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
  3. Rodrigues, H.S. Application of SIR epidemiological model: Newtrends. Int. J. Appl. Math. Inform. 2016, 10, 92–97. [Google Scholar]
  4. Postnikov, E.B. Estimation of COVID-19 dynamics “on a back-of-envelope”: Does the simplest SIR model provide quantitative parameters and predictions? Chaos Solitons Fractals 2020, 135, 109841. [Google Scholar] [CrossRef] [PubMed]
  5. Cooper, I.; Mondal, A.; Antonopoulos, C.G. A SIR model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals 2020, 139, 110057. [Google Scholar] [CrossRef] [PubMed]
  6. Ahmetolan, S.; Bilge, A.H.; Demirci, A.; Peker-Dobie, A.; Ergonul, O. What Can We Estimate From Fatality and Infectious Case Data Using the Susceptible-Infected-Removed (SIR) Model? A Case Study of Covid-19 Pandemic. Front. Med. 2020, 7. [Google Scholar] [CrossRef] [PubMed]
  7. Nguemdjo, U.; Meno, F.; Dongfack, A.; Ventelou, B. Simulating the progression of the COVID-19 disease in Cameroon using SIR models. PLoS ONE 2020, 15, e0237832. [Google Scholar] [CrossRef] [PubMed]
  8. Record, N.R.; Pershing, A. A note on the effects of epidemic forecasts on epidemic dynamics. PeerJ 2020, 8, e9649. [Google Scholar] [CrossRef]
  9. Fanelli, D.; Piazza, F. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos Solitons Fractals 2020, 134, 109761. [Google Scholar] [CrossRef]
  10. Weiss, H. The SIR model and the foundations of Public Health. Mat. Mat. 2013, 3, 1–17. [Google Scholar]
  11. Corless, R.M.; Gonnet, G.H.; Hare, D.E.G.; Jeffrey, D.J.; Knuth, D.E. On the Lambert W function. Adv. Comput. Math. 1996, 5, 329–359. [Google Scholar] [CrossRef]
  12. Corless, R.M.; Jeffrey, D.J. The Wright Ω Function. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2002; pp. 76–89. [Google Scholar] [CrossRef]
  13. Harko, T.; Lobo, F.S.N.; Mak, M.K. Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Appl. Math. Comput. 2014, 236, 184–194. [Google Scholar] [CrossRef] [Green Version]
  14. Heng, K.; Althaus, C.L. The approximately universal shapes of epidemic curves in the Susceptible-Exposed-Infectious-Recovered (SEIR) model. Sci. Rep. 2020, 10. [Google Scholar] [CrossRef] [PubMed]
  15. Rosenlicht, M. On the explicit solvability of certain transcendental equations. Publ. MathÉmatiques l’IHES 1969, 36, 15–22. [Google Scholar] [CrossRef]
  16. Conard, B. Impossibility Theorems for Elementary Integration; Technical Report; 2005 Academy Colloquium Series; Clay Mathematics Institute: Peterborough, NH, USA, 2005. [Google Scholar]
  17. Piessens, R.; Doncker-Kapenga, E.; Überhuber, C.W.; Kahaner, D.K. Quadpack; Springer: Berlin/Heidelberg, Germany, 1983. [Google Scholar] [CrossRef]
  18. Prodanov, D. DSP Quadrature Library for JAVA. 2020. Available online: https://github.com/dprodanov/dspquad (accessed on 27 December 2020).
  19. Mori, M. Quadrature formulas obtained by variable transformation and the DE-rule. J. Comput. Appl. Math. 1985, 12–13, 119–130. [Google Scholar] [CrossRef] [Green Version]
  20. Mori, M.; Sugihara, M. The double-exponential transformation in numerical analysis. J. Comput. Appl. Math. 2001, 127, 287–296. [Google Scholar] [CrossRef] [Green Version]
  21. Errico, D. Fminsearchbnd. MATLAB Central Exchange. 2012. Available online: https://www.mathworks.com/matlabcentral/fileexchange/8277-fminsearchbnd-fminsearchcon (accessed on 27 December 2020).
  22. Prodanov, D. SIR Model Fitting for COVID-19 Dataset. 2020. Available online: https://zenodo.org/record/4383118 (accessed on 27 December 2020).
  23. Prelle, M.J.; Singer, M.F. Elementary first integrals of differential equations. Trans. Am. Math. Soc. 1983, 279, 215. [Google Scholar] [CrossRef]
  24. Ma, J.; Dushoff, J.; Bolker, B.M.; Earn, D.J.D. Estimating initial epidemic growth rates. Bull. Math. Biol. 2014, 76, 245–260. [Google Scholar] [CrossRef] [PubMed]
  25. Ferguson, N.; Laydon, D.; Nedjati Gilani, G.; Imai, N.; Ainslie, K.; Baguelin, M.; Bhatia, S.; Boonyasiri, A.; Cucunuba Perez, Z.; Cuomo-Dannenburg, G.; et al. Report 9: Impact of Non-Pharmaceutical Interventions (NPIs) to Reduce COVID19 Mortality and Healthcare Demand; Technical Report; Imperial College: London, UK, 2020. [Google Scholar] [CrossRef]
  26. Bronstein, M.; Corless, R.M.; Davenport, J.H.; Jeffrey, D.J. Algebraic properties of the Lambert W function from a result of Rosenlicht and of Liouville. Integral Transform. Spec. Funct. 2008, 19, 709–712. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Plots of the integrals I ± ( x ) . (A)—c-parametrization with parameters g = 0.75, c = 6.0; (B)—c-parametrization with parameters g = 1.0, c = 3.0. The negative branch is below 0.
Figure 1. Plots of the integrals I ± ( x ) . (A)—c-parametrization with parameters g = 0.75, c = 6.0; (B)—c-parametrization with parameters g = 1.0, c = 3.0. The negative branch is below 0.
Entropy 23 00059 g001
Figure 2. The SIR model variables as functions of time. The instance is parametrized by i m = 5.81 , g = 2.0 .
Figure 2. The SIR model variables as functions of time. The instance is parametrized by i m = 5.81 , g = 2.0 .
Entropy 23 00059 g002
Figure 3. Case fatality model for Belgium. (A)—The parametric fit of the case fatality data; (B)—Cumulative deaths compared to the estimate from the r-variable. The origin corresponds to 1 January 2020.
Figure 3. Case fatality model for Belgium. (A)—The parametric fit of the case fatality data; (B)—Cumulative deaths compared to the estimate from the r-variable. The origin corresponds to 1 January 2020.
Entropy 23 00059 g003
Figure 4. Incidence and case fatality modelling of the first wave. (A)—combined data for Belgium by 29 June 2020; (B)—combined data for The Netherlands by 29 June 2020; (C)—combined data for Germany by 29 June 2020; (D)—combined data for Italy by 29 June 2020; The raw data are smoothed with a 3-day moving average filter; ’mortality’ represents the reported case fatality, ’morbidity’ represents the reported incidence.
Figure 4. Incidence and case fatality modelling of the first wave. (A)—combined data for Belgium by 29 June 2020; (B)—combined data for The Netherlands by 29 June 2020; (C)—combined data for Germany by 29 June 2020; (D)—combined data for Italy by 29 June 2020; The raw data are smoothed with a 3-day moving average filter; ’mortality’ represents the reported case fatality, ’morbidity’ represents the reported incidence.
Entropy 23 00059 g004
Figure 5. Modelling of consecutive outbreaks in Bulgaria. (A)—Raw data are compared to three fitted outbreaks recorded in the period 8 March 2020–29 June 2020. The origin corresponds to 8 March 2020 when the first COVID-19 case was reported. (B)—Raw data for the period 8 March 2020–14 Deccember 2020 were fitted against the SIR model; ‘sir1’—data fit of the reported cases; ‘sir2’—data fit of the reported deaths.
Figure 5. Modelling of consecutive outbreaks in Bulgaria. (A)—Raw data are compared to three fitted outbreaks recorded in the period 8 March 2020–29 June 2020. The origin corresponds to 8 March 2020 when the first COVID-19 case was reported. (B)—Raw data for the period 8 March 2020–14 Deccember 2020 were fitted against the SIR model; ‘sir1’—data fit of the reported cases; ‘sir2’—data fit of the reported deaths.
Entropy 23 00059 g005
Figure 6. Incidence and case fatality modelling of the second wave. (A)—combined data for Belgium by 14 December 2020; (B)—combined data for The Netherlands by 14 December 2020; (C)—combined data for Germany by 14 December 2020; (D)—combined data for Italy by 14 December 2020; The raw data are smoothed with a 3-day moving average filter; mortality represents the case fatality, morbidity represents the incidence.
Figure 6. Incidence and case fatality modelling of the second wave. (A)—combined data for Belgium by 14 December 2020; (B)—combined data for The Netherlands by 14 December 2020; (C)—combined data for Germany by 14 December 2020; (D)—combined data for Italy by 14 December 2020; The raw data are smoothed with a 3-day moving average filter; mortality represents the case fatality, morbidity represents the incidence.
Entropy 23 00059 g006
Table 1. Case fatality parameters, first wave. T is given in weeks and refers to the time passed since 1 January 2020.
Table 1. Case fatality parameters, first wave. T is given in weeks and refers to the time passed since 1 January 2020.
Countryg R 0 T [Weeks] i m
Belgium0.73801.354914.8313.97
Netherlands0.50091.996214.1156.72
Germany0.61091.637015.1226.51
Italy0.37342.676012.8785.39
Table 2. Case fatality parameters, second wave. T is given in weeks and refers to the time passed since 1 January 2020.
Table 2. Case fatality parameters, second wave. T is given in weeks and refers to the time passed since 1 January 2020.
Countryg R 0 T [Weeks] i m
Bulgaria1.01.038.0136.45
Belgium0.49132.035644.8201.88
Netherlands0.32603.067945.371.00
Germany0.58491.709649.8408.33
Italy0.30393.290647.6731.09
Table 3. Incidence parameters, first wave. T is given in weeks and refers to the time passed since 1 January 2020; R 0 = 1 / g ; i m corresponds to the peak of the case fatality.
Table 3. Incidence parameters, first wave. T is given in weeks and refers to the time passed since 1 January 2020; R 0 = 1 / g ; i m corresponds to the peak of the case fatality.
Countryg R 0 T [Weeks] i m
Belgium0.55001.818314.01499.59
Netherlands0.51491.942013.81143.25
Germany0.54831.823712.95383.26
Italy0.40062.496112.35560.30
Table 4. Incidence parameters, second wave. T is given in weeks and refers to the time passed since 1 January 2020; R 0 = 1 / g ; i m corresponds to the peak of the case fatality.
Table 4. Incidence parameters, second wave. T is given in weeks and refers to the time passed since 1 January 2020; R 0 = 1 / g ; i m corresponds to the peak of the case fatality.
Countryg R 0 T [Weeks] i m
Bulgaria1.13070.884446.23465.44
Belgium1.01.042.414,970.01
Netherlands0.10999.10342.77995.74
Germany0.045422.038247.319,595.36
Italy0.55481.802445.234,949.74
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Prodanov, D. Analytical Parameter Estimation of the SIR Epidemic Model. Applications to the COVID-19 Pandemic. Entropy 2021, 23, 59. https://doi.org/10.3390/e23010059

AMA Style

Prodanov D. Analytical Parameter Estimation of the SIR Epidemic Model. Applications to the COVID-19 Pandemic. Entropy. 2021; 23(1):59. https://doi.org/10.3390/e23010059

Chicago/Turabian Style

Prodanov, Dimiter. 2021. "Analytical Parameter Estimation of the SIR Epidemic Model. Applications to the COVID-19 Pandemic" Entropy 23, no. 1: 59. https://doi.org/10.3390/e23010059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop