1 Introduction

COVID-19 (Coronavirus Disease 2019), the name given by the World Health Organization (WHO) to the worldwide spread disease caused by novel coronavirus, the most highlighted topic nowadays. The virus is totally new and previously has not been observed ever, hence it is called ‘novel’. The origin of this virus is China and the first case has been observed in Wuhan city of China in December 2019. After that, it has been spread worldwide and has affected more than 214 countries and territories till now. The present statistics says that the most developed countries i.e., first world countries like the USA, Italy, France, Spain, Germany are suffering more with corona than the other countries in the world. According to WHO globally, 35,944,662 confirmed cases have been registered and 1,051,875 deaths have been recorded till date [1]. The rapid growth in the number of affected and deceased people is crucial and therefore governments have been forced to take hard decisions such as complete lockdown and sometimes partial lockdown in their respective countries.

Several research works have reported different factors of the coronavirus outbreak [2,3,4]. Previously published reports are based on the prediction of the end dates of COVID-19 in different countries [5], the size at the end [6], the prediction of spreading date-wise in the affected countries [7, 8], etc. For example, Mahalle et al. [9] have discussed some challenges and have recommended some strategies to control the outbreak. In their report, different prediction techniques have been adopted using different platforms including mathematical modelling, big data, social media post, data science and machine learning. Furthermore, Bhola et al. [10] have shown that the only way to decrease the number of new infective with time by isolating them from the susceptible population. An influenza epidemic about a century ago has shown how the situation can get worse day by day just for the exponential growth due to the large number of susceptible people in India. The same can be repeated if the current situation goes uncontrolled or casually attended. The impact of travel restrictions due to COVID-19 has been discussed in [11] where the modelling results have shown that sustained 90% travel limitations to and from China differently affects the epidemic route. The results suggest that isolation, home quarantine and early detection are more effective than travel restriction because, it has delayed the progression of the virus within China by 3–5 days, but could not stop it from propagation.

The current situation of China, mainly six provinces have been analysed in [12,13,14] whereas the same for Germany [15], Portugal [16], United States [17], Italy [18], Brazil [19] have been discussed in the recent literature. Although the spread of COVID-19 has increased exponentially all over the globe, the fatality rate is not so alarming which gives residents of every country to re-establish the confidence to overcome and fight against this pandemic together. However, the fatality rate for severe cases has been observed as 10% for which, the governments should adopt proper series of strategies as early as possible. Numerical analysis of COVID-19 has been done by Bai et al. [20] where recovery rate and transmission rate characteristics have been tracked and the future trend has been predicted. Deterministic Susceptible-Infected-Removed (SIR) model can be enhanced to stochastic model in future to achieve a better precision in the experimental result. The novel feature of this specific one is its ability to spread in the latent time, thus the time delay has been considered in differential equations reported by Chen et al. [21] for the proposal of a dynamic model to describe the pandemic in China. The transmission route of the virus has been analysed in [22] using decision tree and prior algorithms in their approach, there is a chance of data to be missed and more algorithms need to be added for checking the overall performance. Spread of COVID-19 in Hubei of China has been modelled and predicted in [23], where the characteristics of populations like effect of age, people having other health problems and steps taken for controlling the spread have not been analysed. YAFIA has presented a model and dynamics with social distancing and isolation effect have been applied to Moroccans in [24]. The level of abstraction for the available data is a major concern in [25] where modelling of biological systems has been considered to understand the spreading pattern of COVID-19. Yichi et al. [26] have presented a time series and kinetic analysis model where they have proved that emergency interventions regarding the restriction in people going out by the Chinese government have a great impact on the spread of the pandemic. Experimental results establish that bodies of deceased people do not result in additional infection and recovered people get an advantage of the prepared antibodies in their bodies, which prevent reinfection later. Delay in identifying the symptoms and Chinese medicine preparation have been considered as the factors contributing to the modified viral shading dynamics patterns of COVID-19 as reported by He et al. in [27]. The first 425 cases have been considered by Guan et al. [28] for the determination of epidemic characteristics in terms of mean incubation time [28, 29], mean serial interval, confidence interval, the basic reproduction number and concludes that transmission has taken place from the middle of December 2019. The transmission record of early stage from January 10 to January 24 has been analysed and observed exponential growth. The experimental results identify the potential of the virus to cause outbreaks in Mainland China. With the increase in the reporting rate from 8-fold to 0-fold, the mean of the basic reproduction number has been calculated in [30] which is a little bit higher than the same measure reported by WHO. The measurement is typically based on the accuracy of determining Susceptible-Infected, which requires a sufficient amount of patient samples over a significant follow-up time.

In comparison with China, higher death rates and lower recovery rate have been observed due to Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2), which has proved that most of the Italian cases are of severe infection because of delayed lockdown and initially lower restrictions [31, 32]. Awareness within general people is mandatory for effective interventions of government, and thus the number of affected people in Iran has been estimated up to 15th April in the research work of Zareie et al. [33]. Worldwide twenty countries all over the world have been considered in [34] for providing reliable statistics of the COVID-19 scenario. The transmission model has been parameterized using parameters from the existing literatures or some reasonable values, which somehow depends on the strength and speed of drugs to be undertaken. Basic reproduction number, non-affecting period, pre symptom affecting period, probability of severity development, probability of severity diagnosis, probability of mild case diagnosis as well are the factors contributing to successful pandemic modelling. Being the second epicentre of COVID-19 spread, Italy is the main focus of study which is conducted by Russo et al. in [35]. In their work, they have used SEIRD model (S: Susceptible, E: Exposed, I: Infected without symptoms, R: Recovered, D: Deceased) considering two categories of population namely asymptomatic and symptomatic. The genetic algorithm has been used to optimize the objective function based on all combinations of initial predictions of pandemic dynamics.

The control reproduction number and basic reproduction number are the two factors of analytical calculation in Nadim et al.’s work [36] which is focused on the observation of the dynamics of this epidemic in five provinces of China. Their research outcomes show that management of quarantined people is more effective than the management of isolated people. The impact of border control on the transmission of COVID-19 has been reported by investigating the outbreak trend in mostly visited ten cities of China in [37] where risk assessment has been done. Volpert et al. [38] have analysed disease transmission data and have found three possible patterns, i.e., growth-decay-growth dynamic, growth-decay dynamics and growth dynamics. The evolution of COVID-19 has been performed and future possibilities have been predicted using the extended version of composite Monte-Carlo model integrated with deep learning and fuzzy rule in [39]. In their work the main focus is decision making, where the other existing works are concentrated on disease modelling only. The duration and severity of lockdown should be scheduled against the impact on economy. Whenever these measures are relaxed a bit, the disease transmission trend is back to its original exponential growth [40]. The impact of COVID-19 on economy has been evaluated in [41] by solving a stylized production-based asset pricing model with the use of transmission rate estimated by SIR model of China, Japan and Korea. Four mostly infected regions of China and five mostly infected regions of Italy have been focused of research in [42] where the assumption of COVID-19’s omission during summer has been investigated by considering three environmental factors, i.e., the highest wind speed, maximum temperature and maximum relative humidity. Research findings show that the first and third environment factors are negligible, whereas the impact of the second factor to the virus transmission can be ranged between negligible to moderate. However, preliminary data have been used here; it can hold a stronger conclusion whenever more data become available. In the current situation due to the pandemic, where the worldwide infection and decease rate are alarming, the world needs quick recovery and for that, a proper prediction regarding the transmission range, transmission trend and decision making of lockdown measures are needed. From the existing literature, it is clear that there have been many attempts to predict the disease-spreading trend, but all of them have considered only either their own country or maximum two or three countries from worldwide.

However, to implement isolation and social distancing, we must follow the guidelines, taken by COVID-19 removed countries. Also, it is required to analyse the decisions are made by all of them and the factors like population density due to which, the decision should vary country wise, should also be taken into account for better understanding and analysis of the situation. In this regard, the objective of our proposed work has come into picture and approaches like the well-known SIR model and regression, which have history of accurate prediction of earlier diseases [15], have been applied in our work to get a better accuracy and efficiency in the prediction. This paper contributes in the concerned field in terms of prediction for fifteen countries from first, second and third world and in decision making for lockdown there. None of the existing works have considered worldwide disease transmission trends to predict viral infection, so the novelty of our proposed work lies here. The main contributions of the work are summarized as follows:

  • A time dependent discrete SIR (S: Susceptible cases, I: Infected cases, R: Removed cases) model has been adopted to predict and analyse the disease transmission trend.

  • The change in the basic reproduction number, which is the indicator of the transmission trend, not the infection rate has shown that after lockdown it has decreased which eventually proves that the decision of lockdown is essential globally to control the massive impact of COVID-19 on the common people, especially to the population of the third world countries where the economic condition is poor to handle the situation.

  • However, more strictness in the steps taken by the government authorities to maintain the social distancing and lockdown decisions could make the lockdown a success.

The comparative summary of the existing works and the motivation behind taking up our proposed work from their shortfalls have been presented in Table 1. In comparison with other existing works of Table 1, the conclusions are made which are as follows:

  1. 1.

    Except the work in reference [7] all the earlier reported works which are listed in the Table 1 have considered only one country whereas our proposed work is based on most affected countries in worldwide. Our research has been conducted based on most effected first world, second world and third world countries.

  2. 2.

    In our proposed work the basic reproduction number (R0) has been reported as less than 1, which proves the effectiveness of lockdown.

  3. 3.

    The regression analysis has been applied in our proposed work and exponential smoothing analysis (Due to implementation of certain lockdown affect has been analysed in our experimental result) has also been applied for short duration of time in our proposed model. On 25/02/2021 few data have been collected and analysed with our proposed model. It has been observed that our proposed model offers satisfactory results for the prediction of active COVID-19 cases. For example, number of infected persons (official data) in USA, India, Mexico, Russia and Brazil are 9,096,024, 157,418, 263,479, 3,595,608 and 818,529 respectively. But regression analytical data for those countries are calculated as 9,008,471, 601,194, 273,892, 587,142 and 837,845 respectively. Moreover, applying exponential smoothing function, the infected persons for those countries are calculated as 8,282,930, 265,073, 224,578, 513,627 and 682,703 respectively.

  4. 4.

    In our result, the first world country (like USA), the infected population have been increased exponentially till date, which means it has followed its previous trend and our regression analysis have followed the same trend nicely. But, after 21st September 2020 in some third world country (like India), the number of infected people have been decresed extremely, and our smoothing analysis lucidly follows the results. Moreover, it has been noticed that after 31st January 2020, the number of infected people in second world country (like Russia) have been fallen tremendously. Since, the effect of lockdown and government meauses always impact our experimental results, therefore our decision based on our proposed algorithms has been taken better judgement for this COVID-19 epidemic.

Table 1 Comparative study of proposed work with existing works

The remaining portions of the paper are arranged as follows. After introduction of the work in the Section 1 the detail data analysis regarding COVID-19 for the period of February, 2020 to September, 2020 has been done in the Section 2. Followed by data analysis in Section 3, the proposed methodology for predicting the lockdown end dates of lockdown in the fifteen countries from first, second and third world has been discussed. The experimental result with comparison and performance evaluation has been presented in Section 4. Furthermore, in Section 5 the plan of actions taken by planning commission, health care organizations are discussed and finally, the conclusion and future directions of our proposed work have been discussed in the last section.

2 Data analysis

Forecasting methods can help the governments to take the right decision against this pandemic and proceed further accordingly. Here in this section, few data analysis has been done for the period of February, 2020 to September, 2020. Figure 1 shows that the USA is leading in the total number of positive cases of COVID-19 and India holds the second position, where in the second world countries like Kazakhstan, Uzbekistan and Romania and the third world countries like Philippines, Mexico and Thailand have not been affected much.

Fig. 1
figure 1

Country wise total cases up to 15.09.2020

Figure 2 shows number of positive cases per 1 million of population, here comes the concept of population density, because more the population density, more is the chance of spreading the virus. Brazil and USA are facing the highest number of infections per 1 million of population, whereas Spain are not much behind. From 1st February to 15th September, the daily active cases have been recorded and shown country wise in Fig. 3. From Fig. 3 it has been observed that the daily active cases are all time highest in the USA. On the other hand, numbers of daily active cases of Brazil are decreasing in nature from second week of August whereas daily active cases of India are still increasing in nature. The total cumulative death in country wise also been recorded which has been depicted in Fig. 4. The death is highest in the USA in the order of approximate 2.2 lakhs where as in Brazil, India and Maxico the total deaths are 1.4 lakhs, 1 lakh and 76 thousand respectively. The daily recovered cases have also been shown in Fig. 5.

Fig. 2
figure 2

Country wise total cases per 1 milion population upto 15.09.2020

Fig. 3
figure 3

Date and Country wise total active cases up to15.09.2020

Fig. 4
figure 4

Date and Country wise total deaths up to 27.09.2020

Fig. 5
figure 5

Date and Country wise total recovered up to 27.09.2020

Now, in the whole world, the death rate due to COVID 19 per day is approximately 5%. The recovery rate from COVID 19 per day is approximately 95% and the same for first world, second world and third world countries are shown in Table 2.

Table 2 Country wise recovery and death rate mapping with population

3 Proposed methodology

Hethcote has analysed different mathematical models in [43] and applied them in specific diseases according to the trend of spreading. Contact number, basic reproduction number, replacement number of SIR (S: Susceptible cases, I: Infected cases, R: Removed cases), MSEIR (M: People having passive immunity, S: Susceptible population, E: People exposed to the disease, but not yet affected, I: Infected and spreading people, R: Recovered people having permanent immunity) [44] and SEIR models [45] have been reviewed and discussed which is used to estimate various disease patterns later. The basic reproduction number is denoted as R0 which can be defined as the average number of infected people due to entrance of an infected one into a susceptible population. Age structure, spatial structure and heterogeneity have been considered as the main factors contributing to the trend of disease spread here.

3.1 SIR model for COVID-19

SIR model for detecting any disease as epidemic says that if λ is the density of susceptible birth, then

$$ N=S+I+R=\mathrm{constant}\ \mathrm{when}\ \lambda =0 $$
(1)

Where.

N:

total population

S:

total susceptible people

I:

total infected people

R:

total removed people (dead/recovered)

and the rate of change in susceptible population w.r.t. time is

$$ \frac{ds}{dt}=\lambda -r(t)\left(1-\alpha (t)\right)\left(1-\mu (t)\right)S(t)I(t)-\alpha (t)S(t) $$
(2)

The rate of change in infected population w.r.t. time is

$$ \frac{dI}{dt}=r(t)\left(1-\alpha (t)\right)\left(1-\mu (t)\right)S(t)I(t)-a(t)I(t)-\mu (t)I(t) $$
(3)

The rate of change in removed population w.r.t. time is

$$ \frac{dR}{dt}=a(t)I(t)+\mu (t)I(t)+\alpha (t)S(t) $$
(4)

Where r = rate of getting infected, a = average rate of recovery & death,

α= lockdown rate of susceptible, μ = isolation rate of infectious.

Therefore, the total change rate

$$ \frac{dS}{dt}+\frac{dR}{dt}+\frac{dI}{dt}=0 $$
(5)

The susceptible fraction of the population

$$ s(t)=\frac{S(t)}{N} $$
(6)

The infected fraction of the population

$$ i(t)=\frac{I(t)}{N} $$
(7)

The removed fraction of the population

$$ r(t)=\frac{R(t)}{N} $$
(8)

and

$$ s(t)+i(t)+r(t)=1 $$
(9)

It has been assumed that 0 ≤ l ≤ 1 and 0 ≤ i ≤ 1 where l is the lockdown of susceptible rate and i is the isolation of infectious rate. The fraction of susceptible protected people is (1-l) Sand the fraction of un isolated infectious people is (1-i) S. The basic reproduction number denoted by R0 has been expressed in different forms during the first, second and third phase of outbreak in different countries.

The durations of first, second and third stages of outbreak in all the fifteen countries considered by authors as represented in Table 3 where the first stage denotes the appearance stage, the second stage deals with the local transmission and the thirds stage is concerned about community transmission. The initial stage is where the disease is first introduced and active cases start to emerge for the very beginning, in this stage, almost everyone is protected, because very few people have come across the persons with travel history. The local transmission is the spread of the disease from a person who has a travel history to his local people (family, neighbours and friends) or the spread from a directly contacted infectious person. The community transmission is the most dangerous stage where the source of transmission is very difficult to trace. The newly affected people have no history of being in contact with foreign sourced people or the people belonging to hotspot areas. Phase 4 or the widespread outbreak has been faced by China only until the number of cases and deaths begin to grow exponentially; Italy and USA are apparently in this stage now.

Table 3 Country wise duration of different stages of outbreak [46, 47]

3.2 Time dependent discrete SIR

Using time dependent discrete SIR model, Eqs. (2), (3), (4) have been modified.

$$ S\left(t+1\right)-S(t)=\frac{-r(t)S(t)I(t)}{N} $$
(10)
$$ I\left(t+1\right)-I(t)=\frac{r(t)S(t)I(t)}{N}-a(t)I(t) $$
(11)
$$ R\left(t+1\right)-R(t)=a(t)I(t) $$
(12)

At the very first stage, the number of confirmed cases is very low and it can be assumed that most of the total population is in susceptible stage (N ≈ S(t)). So, by modifying Eq. (11), we get

$$ I\left(t+1\right)-I(t)=r(t)I(t)-a(t)I(t) $$
(13)

Daily recovery rate and transmission rate can be denoted by

$$ a(t)=\frac{R\left(t+1\right)-R(t)}{I(t)} $$
(14)

And

$$ r(t)=\frac{I\left(t+1\right)-I(t)+R\left(t+1\right)-R(t)}{I(t)} $$
(15)

3.2.1 Detection of transmission and recovery rate using ridge regression

Finite impulse response filter [48] can be used to predict the transmission and recovery rate denoted as follows

$$ {r}^{\prime }(t)={p}_0+{p}_1r\left(t-1\right)+{p}_2r\left(t-2\right)+{p}_3r\left(t-3\right)+\dots +{p}_Cr\left(t-C\right)={p}_0+\sum \limits_{c=1}^C{p}_cr\left(t-c\right) $$
(16)
$$ {a}^{\prime }(t)={q}_0+{q}_1a\left(t-1\right)+{q}_2a\left(t-2\right)+{q}_3a\left(t-3\right)+\dots +{q}_Da\left(t-D\right)={q}_0+\sum \limits_{d=1}^D{q}_da\left(t-d\right) $$
(17)

Where C and D are the orders of the two filters having coefficients pc (0 ≤ c ≤ C) and qd (0 ≤ d ≤ D) respectively and r(t) and a(t) are the predicted transmission and predicted recovery rate. There are various machine learning methods like partial least square, regularized least square (also called ridge regression) and original least square method to solve the following optimization problem

$$ {\min}_{p_c}\sum \limits_{t=C}^{T-2}{\left(r(t)-{r}^{\prime }(t)\right)}^2+{\gamma}_1\sum \limits_{c=0}^C{p}_c^2 $$
(18)

And

$$ {\min}_{q_d}\sum \limits_{t=D}^{T-2}{\left(a(t)-{a}^{\prime }(t)\right)}^2+{\gamma}_2\sum \limits_{d=0}^D{q}_d^2 $$
(19)

Where γ1 and γ2 are its regularization parameters respectively.

3.2.2 Detection of number of affected persons and recovered persons

Finite impulse response filter can be used to forecast the amount of affected and recovered population by previously discussed time dependent SIR model. After calculating the transmission rate and recovery rate according to (14) and (15), the ridge regression is used to solve the optimization problems of (18) and (19) keeping the constraints of (16), (17) to learn the pattern of coefficients of finite impulse response filter.

For predicting the affected and recovered population, r(t) and a(t) have been replaced by ȑ(t) and ȁ(t) in Eqs. (17) and (18) where the predicted number of affected people and recovered people at time t = T are denoted as I(t)andR(t). This leads to

$$ {I}^{\prime }(T)=\left(1+{r}^{\prime}\left(T-1\right)\right)-{a}^{\prime}\left(T-1\right)\Big)I\left(T-1\right) $$
(20)
$$ {R}^{\prime }(T)=R\left(T-1\right)+{a}^{\prime}\left(T-1\right)I\left(T-1\right) $$
(21)

When t > T the affected and recovered population can be predicted as follows

$$ {I}^{\prime}\left(t+1\right)=\left(1+{r}^{\prime }(t)-{a}^{\prime }(t)\right)I(t) $$
(22)

and

$$ {R}^{\prime}\left(t+1\right)=R(t)+{a}^{\prime }(t)I(t) $$
(23)

The deterministic model for epidemic considered by us is more appropriate to large population. So, when I(t) and R(t) are comparatively trivial, the accuracy may not be as desired, in that case, stochastic model like Markov Chain can be used to approximate and forecast the factors according to requirement. The algorithm of the prediction method is as follows.

figure f

3.3 Steady state solutions

According to the steady state solution, the probable duration of susceptibility can be defined by P (min (TL | TS)) where TL is the duration of being alive and TS is the duration of remaining in susceptible state before being affected. The probable duration can be expressed by

$$ P\left(\min \right({T}_L\left|{T}_S\left)\right)={\int}_0^{\infty }{e}^{-\left(\mu +\lambda \right)x} dx=\frac{1}{\mu +\lambda}\right. $$
(24)

The equilibria of steady state equilibrium states that the basic reproduction number

$$ {R}_0=\frac{r}{a} $$
(25)

where \( \frac{dI}{dt}>0\ \mathrm{when}{R}_0>1 \) implies that the number of affected people increases with the increase in reproduction number and decreases with the decrease in the reproduction number.

3.4 Trend of disease progression with lockdown

The resolved factor R depends on S and I, so here we have restricted our implementation to SI. With the progress of disease spreading over a longer time, the new births have been considered. In this scenario, the system can be represented by

$$ \frac{ds}{dt}=\lambda -\left(1-\alpha \right)r\left(1-\mu \right) SI-\alpha S $$
(26)
$$ \frac{dI}{dt}=\left(1-\alpha \right)r\left(1-\mu \right) SI- aI-\mu I $$
(27)

To detect the effectivity of social distancing approach we have divided the infected population into two categories by extending the conventional SIR model. The first one is called Type I who are generally the detectable infected persons and the second one is called Type II who are undetectable infected person. If the probability of an infected people becoming detectable is denoted by p1 and becoming undetectable is p2, then

$$ {p}_1+{p}_2=1 $$
(28)

If the transmission rates among Type I and Type II people are denoted by r1 and r2 respectively and the recovery rates for the same area1 and a2, then

$$ {R}_0={p}_1\frac{r_1}{a_1}+{p}_2\frac{r_2}{a_2} $$
(29)

In practical scenario, Type II population has a higher transmission rate than that of a Type I population. The controllability measure of the disease is indicated by spectral radius which in turn shows that if the basic reproduction number R0 > 1, then there is some outbreak, and if it is <1, then no outbreak is there.

Allowing all to keep their interpersonal contacts up to a fraction of normal contacts and cancelling mass gatherings, these two approaches of maintaining social distance have been considered here. The lockdown in the whole country announced by government has definitely acted as infection controller and helped India to face the challenge of COVID-19 in the desired form.

4 Experimental results and discussion

The following section describes different aspects of our proposed work for analysis of lockdown effect on the disease transmission and discusses the contribution in terms of novelty and performance evaluation as well.

4.1 Dataset preparation and validation

The proposed work for analysis and prediction of worldwide COVID-19 has been conducted using dataset collected from worldometer [46], which is developed and maintained by an international group of researchers and developers, having an aim of making world statistics available in a time relevant format and has been recognised as the best free reference website by the American Library Association. Several researchers considered this for undertaking their research on COVID-19. Numbers of daily new cases, daily recovered population, date wise total cases, and active cases have been used as parameters for analysis and prediction of the disease transmission trend and impact of lockdown on it.

4.2 Analysis of lockdown effect on disease transmission

There is no confusion that many factors affect the propagation of the virus, establishing a dynamic propagation model from the estimated factors is a difficult task. The basic estimation of some parameters such as mortality and latency can take place to help people for being cautious about the transmission trend and acting accordingly. The time evolution of the outbreak has been analysed by time dependent SIR model and are represented in Figs. 6, 7 and 8. Result of mathematical modelling on date-wise actual recovered population vs predicted recovered population using SIR model is shown in Fig. 6. From the figure, it is clear that our predicted results are quite near to the reality and proposed model suits its concerned domain. As it is not feasible to provide the whole basic reproduction number calculation and daily new case prediction result in the paper, thus Figs. 6, 7 and 8 shows the graphical representation of our results.

Fig. 6
figure 6

Date-wise actual total recovered population vs predicted total recovered population in five most affected countries

Fig. 7
figure 7

Actual daily new cases vs predicted new cases

Fig. 8
figure 8

Change in basic reproduction number after lockdown in the five most affected countries

A free machine learning library of Python 3, Scikit-learn has been used for running the ridge regression calculation. Since the transmission rate can never be a negative, it is set as 0 when it is 0. The data before 15th February, 2020 are insignificant and sometimes unavailable, which results to a noise in analysing the trend, hence only the data onwards 15.02.2020 has been considered here. In an epidemic model, few questions always arise i.e., end date of epidemic and total infected people in the future due to that epidemic. For answering these questions, the parameter of basic reproduction number is defined as the number of newly infected people by an infected person before his/her recovery or death. In the classical SIR model, it is simply the ratio of the number of persons in contact with an infected one and number of days he/she takes to recover which is modified as a function of time in the time-dependent SIR model later. When the basic reproduction number is calculated as greater than 1, in that scenario, the disease is said to spread exponentially and a certain fraction of the total population is said to be affected eventually.

The predicted daily new cases have been compared to the actual daily-infected population as shown in Fig. 7. It is clearly seen from the Fig. 7 that SIR model has able to predict the infection rate and infected population efficiently for most affected twelve countries. The deviation is there, but predicted graph is following the pattern of the actual one.

The effect of lockdown in the basic reproduction number (R0) of five most affected countries i.e. Brazil, India, Mexico, Russia, and USA has been shown in Fig. 8, where the transmission rate is concerned, not the infected people. During strict lockdown, the infected population has increased exponentially, which means it has followed its previous trend. But the basic reproduction number (the ratio of transmission rate and recovery rate) has decreased (R0 < 1), so we can conclude that lockdown decision is obviously effective. If lockdown is more strict then the infected population can also decrease and once the vaccine comes to the market, people are safe.

Therefore, the experimental results depict that prediction of infected population and recovered population have some minor error when the predicted result is compared to the actual one. Moreover, we can track the transmission trend and characteristics with respect to time for fifteen countries worldwide. To understand the impact of social distancing during lockdown, it has been seen that proper social distancing can lead to decrease in the propagation rate.

We have taken examples from January 22, 2020 to September 15, 2020. It consists of 238 instances and four attributes. These attributes have information about the number of daily new cases, daily recovered population, date wise total cases and active cases which are used as the parameters for analysis. The strength of the model shows a strong predictor model considering all the factors as the Statistical ANOVA measure (See Table 4). Variance analysis (ANOVA) comprises simulations, which provide knowledge on levels of variation within a regression model and form the basis for meaningful tests. The significance of ‘F’ value is 0, which derives the ‘P’ value to check the null hypothesis that all-group data are derived from groups with the same means. If ‘P’ value is greater than 0.05, then there is a chance that the null hypothesis is true (See Table 5). In Table 6, authors represent the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and score for our predictor model.

Table 4 Statistical ANOVA measure
Table 5 ANOVA showing the Significance of p value to validate the model for prediction of Recovery rate
Table 6 Correlation coefficients of attributes

5 Plan of action taken by government authority

The main objective of the article is to prepare a mathematical model for decision making of lockdown during COVID-19 period. Figures 1, 2, 3, 4 and 5 depict that, except India and Brazil the second world and the third world countries are not much affected of COVID-19. Owing to lockdown, the basic reproduction number (R0) of the five most affected countries i.e. Brazil, India, Mexico, Russia, and USA has been affected as depicted in the Figs. 6, 7 and 8. However, in our mathematical regression model the number of infected people increases with respect to time as shown in Fig. 9. It has also been observed from the Fig. 9 that the number of infected people for India, Brazil and Russia is slightly lower from the month of august, 2020. Over a short period of time the variation in the number of infected people is constant, whereas it changes gradually over a long period of time. The prediction of active infected people up to July 2021 for five most affected countries (shown in Fig. 9) have been made by applying time series analysis where the value of smoothing constant alpha is in the range of 0.2 and 0.6). Henceforth, the planning commission, healthcare organizations and the government agencies may adopt the below necessary plan of action based on our proposed prediction model analysis, which are discussed in the following subsections.

Fig. 9
figure 9

Graphs for prediction of active COVID-19 infected people up to July 2021 for five most affected countries

5.1 Strict/partial lockdown

It has been observed that, the number of infected population has increased exponentially during lockdown period, which indicates that it followes its previous trend. On the other hand, the basic reproduction number (R0), is the ratio of transmission rate and recovery rate has decreased (R0 < 1) at the same period. Hence, it concludes that the strict lockdown decision is obviously an effective measures. Moreover, if the lockdown is strict and continue for longer time then infected population can be decreased upto a substantial level.

On the other hand, few governments (like India) have announced in different unlock phases to re-open few offices and places by considering economic benefits for their country. But, it has also been observed that the number of infected people for those countries is still downgraded. Hence, it concludes that people are aware and strictly following the COVID-19 protocol i.e., to maintain social distancing, wearing a face mask in public, frequent hand washing/sanitizing etc. Furthermore, the government authorities should execute a plan regarding continuous monitoring and self-isolation for people exposed or symptomatic. Finally, the government should also take severe care for the people having 50-years age [49].

5.2 Herd immunity

The government can take the necessary measures to make the large portion of a community becomes immune to the disease, as a result the whole community becomes protected. If the entire community is immune, then the government can withdraw the decision of lockdown. However, it is important to know about the percentage of people need to be immune to achieve herd immunity. Even though some people (recovered from COVID) may not have required immunity, the population or herd is already immune as they have been recovered from the disease. Moreover, if the herd is exposed to the disease again, the criticality due to the disease would be very low. For example, in India 10.1 million people have already been exposed to COVID until January 2021. As a result, this much population have developed immunity against the disease and fresh exposure would not be a threat to them as well as the transmission of disease would not be made by them to the people of that country who have not developed immunity. Hence, non-immune people are relatively safe from COVID. In the research, the average reproduction number (R0) for India has been estimated as approximately 1.2 [50, 51]. Therefore, approximately 80% of a community in India needs to be immune in order to achieve herd immunity against COVID.

6 Conclusion

The role of lockdown on susceptible people and isolation of infectious people has been discussed in this paper based on the basis of reproduction number of SIR model on epidemic spread in general. Three phases of coronavirus spread in different countries have been considered here for the analysis: Beginning/first phase of the outbreak (3rd-8th March), Intermediate stage/s stage of the outbreak (9th–19th march), Third stage of contamination (20th march-till date). In the first stage, the cumulative growth rate was moderate, where as in the second stage of the epidemic, the transmission rate gets higher and the infection rate increases drastically. In the third and final stage, the lockdown and isolation effects have been taken into account; the transmission rate has been smaller compared to the previous phase. Since the basic reproduction number somewhere depends on the lockdown decision and actions, it has been evident that with the increasing rate of lockdown and isolation, the reproduction number has been less than 1. The challenges of predicting such parameters include a) Lack of proper data, b) No proper lockdown implemented by governments, c) No proper tracking of active and new cases.

In the future, these shortcomings can be attempted to overcome for getting improved result of the scenario and experimental results show that our proposed approach is well suited in its concerned domain. However, the prediction can be changed based on some external factors such as government decisions and human actions, so we can’t train the data as of today to be true for a different situation in future. Proposed modified SIR model is not only productive than the traditional one, but also more capable of handling and adopting the current situation.