Introduction

The current outbreak of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has reached most countries around the world and has had a devastating impact on several of them. Physicians of various specialties have found themselves in the eye of the storm of the coronavirus disease 2019 (COVID-19). Physicians and physician-scientists are working around the clock to treat infected patients, educate the public about social distancing measures, and test potential treatments and vaccines. To fully understand the nature of the pandemic and the impact of social distancing measures, one must understand the mathematics behind it. One of the most relevant mathematical models relating to the spread of a pandemic is the susceptible-infectious-removed (SIR) model and its variants. Unfortunately, the mathematical depth of this model can seem daunting to some physicians and physician-scientists. In this paper, we will try to explain and simplify the mathematics behind some of these epidemiological models.

Epidemiological studies that divide a population into compartments are called compartmental models. The most commonly used of these models is the SIR model. This type of model has been used in several studies analyzing the spread of COVID-19 [5,6,7,8]. These models help us understand how COVID-19 spreads, predict the regional peaks of the pandemic, and understand the impact of various quarantine measures. All the epidemiological models, which we will discuss in this paper, are based on analyzing systems of differential equations. Differential equations focus on the rate of change of a variable, or group of variables as time passes. They are found in almost all aspects of medicine. They help us in dose medications, understand disease spread, and even dialyze patients. When more than one variable is involved, we usually need more than one differential equation to model a given situation. These are referred to as a system of differential equations. Let us begin with the most basic form of SIR models.

SIR Model

During a pandemic, some of the most important factors to know is how fast it will spread and what measures can be taken to slow it down. These impact public health policies including quarantines, travel restrictions, and resource allocation. The SIR model divides a given population into three groups: susceptible, infectious, and removed. As time passes by, the number of people in each of these groups changes. The number of susceptible people is highest at the very beginning of a pandemic, since everyone who is not or has not been infected is considered susceptible in most cases. On the other hand, the number of infectious individuals is at its lowest during the beginning of a pandemic. As time passes by, the number of susceptible people decreases, and the number of infectious people increases. These changes can be modeled using differential equations.

Let us assume that the independent variable (t) stands for time measured in days. Time is the only independent variable in this case. In other words, all other variables evolve as a function of time. Let us use (S) to refer to the number of susceptible individuals at any given time (t). Another way to indicate that S is a dependent on time is to call it S(t). Similarly, I or I(t) represents the number of infectious individuals as a function of time. R or R(t) represents the number of removed individuals as function of time. Removed means that they are no longer contagious either because they recovered or because they died. The three dependent variables, S, I, and R, represent the three possible segments of a given population with N, number of people. This means that the sum of all three variables is equal to N [1,2,3, 8].

$$ S+I+R=N $$

In Fig. 1, we can see that as time passes by, S(t) decreases, while I(t) and R(t) decrease. The sum of the three variables at any given time remains constant as long as the total number of people in the population, N, is constant. If there is a significant change in N, the basic SIR model cannot be used, and a different epidemiological model would have to be used. If we focus on the infectious group, I(t), shown in red, we see that once it peaks, it starts to decrease again. That is because at some point in time, everyone who was infected will have to move to the removed group, R(t). Everyone with an infection has to either recover from the infection or die as a result of it at some point.

Fig. 1
figure 1

SIR model. The above figure shows an example of an SIR model. The susceptible group (yellow), infectious group (red), and removed (green). Adapted from [4]

To better understand the connection between the variables above, we can express them as a fraction of the total population, N. This way, we can assume that sum of the three fractions is always equal to 1, as long as N remains unchanged. Remember a total of 1.0 is the same as saying a percentage of 100%. Therefore, the equation above (S + I + R = N) can be rewritten as

$$ \frac{S}{N}+\frac{I}{N}+\frac{R}{N}=1 $$

We can also express the time variable (t), without changing the overall equation:

$$ \frac{S(t)}{N}+\frac{I(t)}{N}+\frac{R(t)}{N}=1 $$

Let us simplify the equation above by using a small letter to represent each composite function above. In other words, let us use a small s(t) to represent the ratio of susceptible individuals at any given time, instead of their actual number, i(t) to represent the ratio of infectious individuals, and r(t) the ratio of removed individuals [1,2,3, 9].

$$ {\displaystyle \begin{array}{c}s(t)=S(t)/N;\\ {}i(t)=I(t)/N;\\ {}r(t)=R(t)/N\end{array}} $$

The reason for doing is to enable us to carry out calculations with greater ease. Working with three equations whose sum always adds up to 1 is much more elegant than working with equations whose sum is a large number such as N. It also allows us to extrapolate data more easily when comparing the findings between various populations.

$$ s(t)+i(t)+r(t)=1 $$

The extent to which the disease spreads at any given time depends on several factors. The first factor is the number of individuals who are susceptible to the disease, s(t). One way to reduce the number of susceptible people is by vaccination. The second fact which affects disease spread is the number of infectious individuals, i(t). This can be reduced by isolating infectious individuals within a population and preventing the entry of more infectious individuals from other populations. Finally, the spread of the infection also depends on the rate of transmission of disease per contact. We will use the parameter, β, to represent the chance that an infectious individual will transmit the disease to a susceptible individual. It depends on the likelihood that an infectious individual comes in contact with a susceptible individual and the rate of disease transmission per contact [1,2,3, 8, 9]. This is where social distancing, hand hygiene, and wearing masks have the most impact.

What Fig. 2 says is that if an individual is susceptible at a given time, then he or she would either stay in that group or move into the infectious group. Since the number of susceptible people can only decrease over time, the rate of change for susceptible individuals must always be a negative number. The magnitude of this change depends on the ratio of infected individuals at any given time i(t), the ratio of susceptible individuals s(t), and the likelihood of disease transmission between the two groups, β. We will express the rate of change of susceptible individuals s(t), as a differential equation. The notation, \( \frac{d}{dt} \), simply indicates the rate of change over time.

Fig. 2
figure 2

Transmission of disease

The rate of change of the susceptible individuals over time can be expressed as \( \frac{ds}{dt} \) [1,2,3,4, 8].

$$ \frac{ds}{dt}=-\beta \times s(t)\times i(t) $$

The differential equation above shows the rate of change of susceptible individuals, \( \frac{ds}{dt} \), at any given time, depends on β, s(t), and i(t). The negative sign indicates the rate of change is always negative since it is always decreasing.

Figure 3 models s(t) vs time (t). The dotted lines show the slope of the curve at a given point in time. This slope is equal to \( \frac{ds}{dt} \). Notice how the slope is always negative. We can also see that the slope increases in magnitude at first but then starts to flatten.

Fig. 3
figure 3

Susceptibility over time. Notice how the slope of the curve changes over time

Figure 4 shows how changing the magnitude of β impacts the susceptibility curve. When β is relatively large, the infection spreads fast and the number of susceptible individuals drops quickly. When β is relatively small, we see a flatter curve as the disease spread is slowed down.

Fig. 4
figure 4

Three graphs of the susceptible group with varying magnitudes of β

The rate at which infectious individuals moves into the removed group, R, is called γ. The removed group includes individuals who recover and those who die, since both are removed from the infectious pool. The average number of days it takes for an individual to recover from the disease, n, is inversely proportional to γ. Factors that reduce length of illness can include medications and environmental factors as well (Fig. 5).

$$ \gamma =\frac{1}{n} $$
Fig. 5
figure 5

Removal from infectious group

The rate of change of the removed group at any given time depends on the ratio of infectious individuals at that time and the value of γ [1,2,3,4, 8]. This helps come up without second differential equation, focusing on the rate of change of the removed group. Notice that the rate of change in this case is always a positive number, since the number of recovered people can only increase with time.

$$ \frac{dr}{dt}=\gamma \times i(t) $$

Figure 6 shows us that the slope of the curve is always positive for the removed group. Note that the slope is equal to \( \frac{dr}{dt} \).

Fig. 6
figure 6

Removed group over time. Notice the S-shape of the curve as the slope changes

Figure 7 demonstrates how changing the value of γ can affect the removed curve. When γ is large, people recover very quickly and more from the infectious group to the removed group. This means that the disease could die out before infecting the entire population. In other words, a large γ means less people ultimately catch the infection.

Fig. 7
figure 7

Three graphs of the removed group with varying magnitudes of γ

Characteristics of the Infectious Curve

We mentioned previously that the rate of change for the susceptible group is always negative and that the rate of change of the removed group is always positive. Figure 8, below, shows us that the infectious curve is positive at first until it reaches its peak then becomes negative.

Fig. 8
figure 8

The infectious curve i(t) as a function of time (t)

When the rate of change is positive, it means that more people are getting infected than those that re-recovering. When the rate of change is negative, it means that more people are recovering than are getting infected, which happens after the disease reaches its peak. The rate of change of the infectious group depends on both γ and β. It also depends on the ratio of individuals in the infectious and susceptible groups at a given time. As β increases, the rate of change for i(t) increases, and as γ increases, the rate of change for i(t) decreases, in accordance with the following differential equations [1,2,3,4, 8]:

$$ \frac{di}{dt}=\beta \times i(t)\times s(t)-\gamma \times i(t) $$

The differential equation above shows the rate of change of the infectious group,\( \frac{di}{dt} \), on the left side. The rate increases as β increases and as s(t) increases. This makes sense because more people are likely to get infected when the size of the susceptible population is larger or when the risk of transmission is higher. The rate of decreases when γ is higher since this suggests faster recovery.

The average number of people that each person infects is called the basic reproductive number, R0 [8]. Assuming that the total population is 1.0 and that each of the three subgroups are a fraction of the total, R0 can be calculated as follows:

$$ {R}_0=\frac{\beta }{\gamma } $$

When R0 is positive, the rate of infection increases. In Fig. 9, graph C has the highest R0. When R0 is negative, the rate of infection decreases. In Fig. 9, graph G has the lowest R0. Several factors help reduce R0 including social distancing, hand hygiene, and vaccination. R0 can also be used to estimate the herd immunity threshold (HIT), which is the minimum ratio of individuals that must become immune to a disease so that it would die out (Fig. 10). It can be calculated as follows [8].

Fig. 9
figure 9

This figure shows how the infectious curve changes with changing values of γ and β. Similar patterns can be inferred about mortality trends which occur later in time. We can see that as β gets smaller, the infection curve becomes flatter. As γ gets larger, the peak gets smaller

Fig. 10
figure 10

Basic reproductive number and herd immunity threshold. Adapted from [8]

$$ HIT=1-\frac{1}{R_0} $$

SEIR Model

While the SIR model is the most common, other variants of the SIR model are also used by epidemiologists. All these models use a unique system of differential equations. As mentioned before, the system of differential equations used for the standard SIR model is as follows:

  • \( \frac{ds}{dt}=-\beta \times s(t)\times i(t) \) rate of change of susceptible group

  • \( \frac{di}{dt}=\beta \times i(t)\times s(t)-\gamma \times i(t) \) rate of change of infectious group

  • \( \frac{dr}{dt}=\gamma \times i(t) \) rate of change of removed group

Another commonly used epidemiological model is called the susceptible-exposed-infectious-removed (SEIR) model. The main difference between the SEIR model and the SIR model is the addition of the exposed group to the SEIR model. The exposed group is a step between the susceptible and the infectious groups. It includes individuals who have been exposed to the infection but are not themselves infectious yet. Since we have four groups instead of three in this model, we require four differential equations to describe the spread of infection [10, 11].

  • \( \frac{ds}{dt}=-\beta \times s(t)\times i(t) \) rate of change of susceptible group

  • \( \frac{de}{dt}=\beta \times i(t)\times s(t)-\boldsymbol{\delta} \times \boldsymbol{e}\left(\boldsymbol{t}\right) \) rate of change of exposed group

  • \( \frac{di}{dt}=\boldsymbol{\delta} \times \boldsymbol{e}\left(\boldsymbol{t}\right)-\gamma \times i(t) \) rate of change of infectious group

  • \( \frac{dr}{dt}=\gamma \times i(t) \) rate of change of removed group

Notice the addition of the coefficient δ, which gives us the likelihood that an exposed person becomes infected. The strength of this model is that it is somewhat more realistic since exposed individuals do not immediately become infectious. It also helps us understand the impact of isolating exposed individuals on the dynamics of disease spread (Figs. 11 and 12).

Fig. 11
figure 11

Sequence of events in an SEIR model

Fig. 12
figure 12

SEIR model evolving as a function of time

SUQC Model

Looking at a more practical example, Zhao et al. investigated the spread of COVID-19 in different parts of China [5]. They used a susceptible, unquarantined infected, quarantined infected, confirmed infected (SUQC) model, where S(t) is the number of susceptible cases as a function of time, U(t) is the number unquarantined infected cases, Q(t) is the number of quarantined infected cases, and C(t) is the number of confirmed cases as a function of time. The number of removed individuals, R, was not included in this model [5]. Finally, the total number of infected individuals at given time is I(t) such that [5]

$$ I(t)=U(t)+Q(t)+C(t) $$

The SUQC model is unique in that it shows the direct impact of quarantine measures on disease spread. In this model, α is the number of individuals infected by an unquarantined individual per day. The rate of change in the number of susceptible individuals is given by the following equation [5]:

$$ \frac{dS}{dt}=\frac{-\alpha \times U(t)\times S(t)}{N} $$

Again, the negative sign indicates that the number of susceptible individuals can only decrease over time. Other factors that affect it include the magnitude of α, the total number of susceptible individuals, and the number of unquarantined individuals, who are more likely to spread the illness.

γ1 is the rate at which unquarantined infected individuals get quarantined and β is the rate at which cases are confirmed. This gives these additional equations [5]:

$$ \frac{dU}{dt}=\frac{\ \alpha \times U(t)\times S(t)}{N}-{\gamma}_1\times U(t)\kern0.5em \mathrm{the}\ \mathrm{rate}\ \mathrm{of}\ \mathrm{change}\ \mathrm{of}\ \mathrm{unquarantined}\ \mathrm{cases} $$
$$ \frac{dQ}{dt}={\gamma}_1\times U(t)-\beta \times Q(t)\kern0.75em \mathrm{the}\ \mathrm{rate}\ \mathrm{of}\ \mathrm{change}\ \mathrm{of}\ \mathrm{quarantined}\ \mathrm{cases} $$
$$ \frac{dC}{dt}=\beta \times Q(t)\kern0.5em \mathrm{the}\ \mathrm{rate}\ \mathrm{of}\ \mathrm{change}\ \mathrm{of}\ \mathrm{confirmed}\ \mathrm{cases} $$

Notice the similarities between the equations used in the SUQC model and those used in the SEIR model. Even though there is no removed group, individuals in the quarantined group are effectively removed from the population temporarily until their test results come back. If the diagnosis is confirmed, they continue to be effectively removed from the population preventing more disease spread (Fig. 13) [5].

Fig. 13
figure 13

The relationship between the various compartments of an SUQC model. Notice the slope of each graph at any given point. Adapted from [5]