1 Introduction

For the first time in December 2019, a new strain of Coronavirus was identified in Wuhan, China. People were diagnosed with pneumonia for no apparent reason and existing vaccines, and treatments were ineffective. The novel Coronavirus disease took the official name of COVID-19 from World Health Organization (WHO) when the number of deaths surpassed 1000 people. On January 30, 2020, WHO issued a statement declaring the novel Coronavirus, a public health emergency, posed a threat to the entire world, not just China. Following the COVID-19 pandemic crisis in various regions of the world, the government of Iran officially confirmed the epidemic on 19 Feb 2020. Since the beginning of the Coronavirus epidemic coincided with the H1N1 flu epidemic, the health system mistakenly recognized it as flu and was slow to notice the onset of the disease (Mardani et al., 2020). One of the challenging issues in the outbreak of COVID-19 is the incubation period, which varies from 3 to 14 days (Tirkolaee, Abbasian, et al., Tirkolaee, Hadian, et al., 2020). That is why the treatments did not work on patients infected by the COVID-19.

To date, some countries product vaccine and only some of them is successful. According to WHO, COVID-19 is an acute respiratory syndrome that is transmitted through respiratory droplets and airways. The examination of this disease needs the serious attention of governments to take the essential measures to reduce the impact of this global epidemic.

From 22 Feb 2020, the COVID-19 risk status went from white to yellow in Iran. In mid-February 2020, Iran became the second focal point for spreading the Coronavirus in the world after China. In early March 2020 and after the COVID-19 epidemic in Iran, the Iranian minister of health stated that health centers throughout the country should be ready to provide services to the COVID-19 patients. Meanwhile, many public places and events, including schools, higher education institutions and universities, movie theaters, concerts and theater performances, national sports competitions and leagues in Tehran and other cities were gradually closed and government office hours were reduced in several provinces. According to official statistics published by WHO, Iran had the highest number of COVID-19 deaths after the USA, Brazil, India, Mexico, UK, Italy and France by the end of November 2020 (WHO, 2020).

On the other hand, the COVID-19 spread in Iran coincided with the increase in US sanctions against the country, which has profoundly affected the Iranian economy and posed challenges to the Iranian government in terms of economics and healthcare (Khalilpourazari & Doulabi, 2021). As a result, it is necessary for the Iranian government to consider the required measures and facilities (as resources) to increase the number of recoveries and prevent the loss of people under these conditions as much as possible. As it turns out, making the necessary estimates and arrangements to deal with this virus is a critical and basic condition for dealing with it. Therefore, in order to obtain estimates in the future circumstances for providing and allocating space, location, equipment and manpower, it is inevitable to model the prevalence and trend of the virus. One of the effective and practical methods in this field is the application of a combination of statistical techniques and Operations Research (OR) under an uncertain environment (Choi, 2021).

Accordingly, the main contributions of this research are given as follows:

  • Developing a novel regression-based Robust Optimization (RO) model to predict the statistics of patients infected by COVID-19,

  • Testing the performance of the proposed methodology compared to Simple Moving Average (SMA), Exponential Moving Average (EMA), Weighted Moving Average (WMA) and Exponential Smoothing with Trend Adjustment (ESTA) models,

  • Investigating a real case study in Iran, discussing the practical implications and suggesting useful managerial insights through a set of sensitivity analyses,

  • Analyzing the capability of the proposed model for estimating the required facilities and equipment during the pandemic.

The structure of the remaining sections is as follows. Section 2 reviews different models proposed by researchers in the field of COVID-19 outbreak. Section 3 explains the problem and develops the proposed regression-based robust optimization model. The real case study and computational results are presented in Sect. 4. Managerial insights and practical implications of the study are discussed in Sect. 5, and finally, Sect. 6 concludes the main achievements and limitations, and provides a useful outlook for the research.

2 Survey on related models

During the outbreak of the COVID-19 pandemic, not only Iran but also all countries throughout the world have been involved and it has caused irreparable problems such as death and destruction of the economic situation and life manner of the people. The sooner the governments and people can get out of this crisis, the less destructive its effects will be as well as the better health and economic situation. In this regard, many researchers tried to model the prevalence and effects of this pandemic, which is reviewed as follows:

2.1 Regression model

Velásquez and Lara (2020) exploited the Gaussian process regression space for chaotic dynamic systems using the information related to 82 days with continuous learning. According to the latest results, COVID-19 can be predicted by Gaussian models and can be fully employed for epidemic outbreaks, along with infections, mortality and recovery rates. The findings show that new quarantine measures with more restrictions on restraint and control strategies implemented in the United States can be successful, but at a late period, can lead to infection and critical mortality for the next 2 months.

Pavlyshenko (2020) examined different regression methods to model the spread of COVID-19 and its influence on the stock market. A Bayesian regression logistic curve model was utilized to analyze the prediction of Coronavirus spread. Moreover, the effect of COVID-19 was examined using the regression model and then compared to the other effects of the crisis. The results demonstrated that different crises have different effects on similar stocks for different reasons. It is important to investigate their impacts independently. The Bayesian inference in the analysis of uncertainty makes possible the effects of the crisis.

In the study conducted by Sannigrahi et al. (2020) the global and local spatial relationships between key socio-demographic variables, cases of COVID-19 and mortality in European regions were evaluated using spatial regression models such that 31 countries were evaluated. Out of 28 primary socio-demographic variables, 2 cases for COVID-19 patients and 3 cases for death from COVID-19 were filtered as key variables for regression modeling. Germany, Austria, Slovenia, Switzerland and Italy yielded the highest correlations with socio-demographic variables. Oztig and Askin (2020) inspected the relationship between human mobility and the number of Coronavirus patients in different countries in 2019. The dataset covered 144 countries and evaluated the relationship between human mobility and COVID-19-infected individuals. They took into account the air travel volume and the number of airports within the Schengen system. Negative binomial regression analysis was employed to investigate the diversity in people infected with COVID. The findings proved a positive relationship between the higher passenger traffic in airlines and the greater number of patients with COVID-19. Countries with more airports were also associated with more COVID-19 cases. Schengen countries, countries with higher population densities and a greater percentage of the aged population, are more likely to get infected with COVID-19 than other countries.

Rath et al. (2020) analyzed the daily statistics of infected people to predict the trend in active cases in Odisha as well as the whole of India. In this research, linear and multi-linear regression models were applied for the infection process of cases. The results showed that the linear and multi-linear regression models have a strong correlation of 0.99 and 1.0, which expresses a strong predictive model for predicting active cases in the coming days. While the quarantine has become a widely-used control trajectory during the outbreak of COVID-19, Lu et al. (2020) stated that empirical research on the quarantine and attitudes toward COVID-19 affects mental health very little. It was demonstrated how these relationships vary in the distribution of mental health scores using a cross-sectional online study. Moreover, quantitative regression analysis figured out that the quarantine at home reduces depression and increases satisfaction. In contrast, quarantine at the community level reduces vitality, especially for people with lower fragility.

Li et al. (2020) utilized an ML approach to defining novel factors related to the COVID-19 transmission and fatality. They employed the logistic regression model and stated critical factors related to COVID19 infection, death, and case fatality rates in 154 countries and 50 U.S. states. According to the results, high temperature, economic inequality and sports events can facilitate the COVID-19 outbreak, but blood types B and AB and increasing hospital beds decrease it. Duhon et al. (2021) surveyed the impact of non-pharmaceutical interventions (NPIs) including social, demographic and climatic components on the growth rate of COVID-19. They found out that all NPIs do not affect the initial growth rate of COVID-19 but social-demographic and climatic variables are very effective on COVID-19 outbreak.

2.2 MCDM model

Ashraf et al. (2020) proposed a spherical intelligent fuzzy decision model to manage and detect the COVID-19 infected cases in terms of transmission and propagation. Moreover, they developed a new method based on the Ideal Solution-like Preference Sequence (TOPSIS) and Complex Proportional Assessment (COPRAS) techniques in a spherical fuzzy environment. Eventually, an overview of the COVID-19 emergency was presented to demonstrate the efficacy of the proposed methodology, accompanying sensitivity and comparative analysis, feasibility and reliability of the results. Hezer et al. (2021) applied several Multi-Criteria Decision-Making (MCDM) techniques include to evaluate the safety levels of 100 regions in the world affected by COVID-19.

2.3 SEIR model

Chatterjee et al. (2020) surveyed the healthcare impact of COVID-19 epidemic with a stochastic mathematical model. The results showed that all Indian people will be overwhelmed by the end of May 2020 unless accurate attention and effort pay off by considering India’s healthcare resources.

Liang (2020) revealed the propagation laws of three phenomena: COVID-19, SARS and MERS. He compared the expansion peculiarities of COVID-19 with the characteristics of SARS and MERS. He could realize that a growth model is obtained for their growth by regarding the growth rate and inhibition of infectious diseases. The growth rate of COVID-19 is almost double the SARS and MERS. The doubling cycle of COVID-19 growth is 2–3 days, indicating that the number of patients doubles 2–3 days. Sarkar et al. (2020) presented a mathematical model to predict the dynamics of COVID-19 in 17 provinces of India and finally throughout India. The results revealed that to decrease the incidence of infection, the amount of contact between infected and non-infected people through quarantine of susceptible people should be effectively reduced. Moreover, based on the model simulation, SARS-CoV-2 pandemic can be eliminated by combining limited social distances and contact tracking.

Hengjian and Tao (2020) introduced different nonlinear growth curves for the cumulative prediction of COVID-19 patients. The Richards curve was shown to be logical and flexible in this prediction. Furthermore, a new nonlinear regression model was developed to predict COVID-19 patients. They demonstrated that the COVID-19 situation prediction in China has been well done, including time-based and sequential forecasts. According to Pandey et al. (2020), the COVID-19 epidemic has become a major threat to India that needs to be treated by all means. In this study, the prevalence of the disease for India was analyzed by March 30, 2020 and predictions were made for the number of cases over 2 weeks. They used both Susceptible-Exposed-Infectious-Removed (SEIR) model and regression models to conduct the predictions based on the data obtained from Johns Hopkins University between January 30, 2020 and March 30, 2020. They concluded that the number of cases may increase up to 5000–6000 in the considered 2 weeks.

Pan et al. (2021) performed an ecological research in 202 locations (in 8 countries) to survey the effects of meteorological elements (relative humidity, temperature, wind speed and UV radiation) on the transmission of COVID19. The obtained results showed that meteorological conditions did not have a noticeable effect on the transmission statistically. Khalilpourazari and Doulabi (2021) designed a hybrid reinforcement learning algorithm and SIDARTHE to predict the trend of COVID-19 pandemic in Quebec, Canada. They showed that the proposed methodology provides quality solutions for most complex benchmarks. Kumar et al. (2021) developed a dynamic transmission and SEIR-V model to investigate the impact of social media on the number of influenza and COVID-19 cases in terms of infection and deaths. The main findings indicated that social media help to respond to similar disasters.

2.4 Neural network model

Pirouz et al. (2020) used a Binaural Grouping Model (BGM) based on Group Method of Data Handling (GMDH) as one of the Artificial Intelligence (AI) techniques. To this end, Hubei province in China was considered to evaluate the built-up model. Several important components, including minimum, maximum, and average daily temperature, relative humidity, the density of a city, and wind speed, were taken into account as the input data (criteria). It was unfolded that the proposed BGM model represents a greater performance capacity in predicting confirmed items for 30 days. Furthermore, the regression analysis and trend of confirmed items were performed in comparison with variations of daily climatic parameters. The findings showed that relative humidity and maximum daily temperature have the greatest effect. Nikolopoulos et al. (2020) provided some predictive analysis tools for immediate application during the COVID-19 pandemic. They used the data collected from the middle of April 2020 data in the United Kingdom, United States, India, Germany, and Singapore to forecast COVID-19 growth rates nationwide. The growth rate of COVID-19 was predicted by epidemiological, statistical, Machine Learning (ML) and in-depth models and a new hybrid approach based on nearest neighborhoods and clustering techniques. Using ancillary data (Google trends) and simulating government decisions, they modeled and predicted excessive demand for products and services during pandemics.

Dansana et al. (2020) utilized the convolutional neural network (CNN) to scan X-ray images, computed tomography (CT) and decision tree model to detect COVID-19. It was revealed that the exact version of the decision tree model shows a very satisfactory performance.

Table 1 categorizes the most relevant research in the literature and compares the methodology, goal, and country (origin) of the case study.

Table 1 Most relevant studies in the field of COVID-19 prediction trend

Given the research gap in Table 1, the main novelty of this study is the provision of a regression-based robust optimization model concerning possible changes in the statistics. In other words, it is necessary to design a model to predict the incidence of COVID-19 under its complicated uncertainty that can be efficiently utilized in future decision-making processes.

3 Problem statement

Coronaviruses are a large family of viruses that can infect both humans and certain types of animals. Many of the known coronaviruses make a range of respiratory infections happen in humans. The newly-discovered Coronavirus is not exempt from these infectious diseases. The emerging virus was unknown before the recent disease outbreak in December 2019 in Wuhan, China.

According to these conditions, the statistics of daily patients (\(y_{i}\)) and its prediction can be very helpful for effective planning and dealing with this disease. As can be considered, day counter (\(x_{i}\)) that represents the statistics of daily patients of COVID-19 (\(y_{i}\)) is predicted based on the proposed methodology (Fig. 1). The aim is to draw and define related between day i (\(x_{i}\)) and daily patients (\(y_{i}\)), and because of uncertainty in daily patients (\(y_{i}\)), the convex RO method is utilized. The number of daily patients (\(y_{i}\)) have a nominal value (\(\tilde{y}_{i}\)) and take value in \(y_{i} \in [\tilde{y}_{i} - \hat{y}_{i} ,\tilde{y}_{i} + \hat{y}_{i} ].\)

Fig. 1
figure 1

System configuration under the COVID-19 pandemic

3.1 Proposed model

This sections introduces our proposed model which is applicable for short-term forcasting. To do so, relative days and number of cases are taken into consideration for projecting the course of COVID-19. In the following, the indices, parameters and variables to develop the proposed model are defined first.

Indices

\(i\) Index of days; \(i \in \{ 1,...,\left| I \right|\}\),

\(k\) Index of regression degree; \(k \in \{ 1,...,\left| K \right|\}\).

Parameters

\(x_{i}\) Day counter (day i),

\(y_{i}\) Number of infected people in day i, \(y_{i} \in [\tilde{y}_{i} - \hat{y}_{i} ,\tilde{y}_{i} + \hat{y}_{i} ],\)

\(\tilde{y}_{i}\) Nominal valusse of infected people in day i,

\(\hat{y}_{i}\) Tolerance of infected people,

\(\rho\) Uncertainty coefficient of robustness,

\(\Gamma_{i}\) Robustness budget (conservatism level) in day i.

Variables

\(a_{k}\) Coefficient k-regression function,

\(y^{\prime}_{i}\) Number of forecasted infections in day i,

\(hp_{i} ,\,hm_{i}\) Auxiliary variables for linearizing the absolute functions,

\(z_{i} ,\,r_{i0}\) Auxiliary variables related to robust optimization approach,

\(R^{2}\) Coefficient of correlation.

Model 1. Polynomial regression with MAD

$$ {\text{min}}\,\,MAD = \sum\limits_{i} {\frac{{\left| {y_{i} - y^{\prime}_{i} } \right|}}{\left| I \right|}} $$
(1)

subject to

$$ y^{\prime}_{i} = \sum\limits_{k} {a_{k - 1} x_{i}^{k - 1} } \quad \forall i, $$
(2)
$$ a_{k} \in {\mathbf{R}} \quad \forall k. $$
(3)

The objective function (1) tries to minimize the Mean Absolute Deviation (MAD). The MAD includes the mean deviation of the number of infected people in day i and the number of forecasted infected people (Jeyakumar et al., 2014). Constraint (2) represents functional forecasting regression, including parameters and decision variables. Constraint (3) indicates the decision variables of \(x_{i}\) which take real numbers. In other words, it shows the range of regression function coefficients.

3.2 Linearization of proposed model

Given the absolute function in Model 1, it should be linearized using the following equations in Model 2. Therefore, two new positive variables are employed to do.

If \(k = \left| \Omega \right|,\) therefore, we can consider the following replacements in the model;\(k = \alpha + \beta ,\,\,\,\,\Omega = \alpha - \beta ,\,\,\,\alpha ,\beta \ge 0.\)

Model 2. Linearization of Model 1.

$$ {\text{min}}\,\,MAD = \sum\limits_{i} {\frac{{hp_{i} + hm_{i} }}{\left| I \right|}} $$
(4)

subject to

$$ y_{i} - y^{\prime}_{i} = hp_{i} - hm_{i} \quad \forall i, $$
(5)
$$ hp_{i} ,hm_{i} \ge 0 \quad \forall i, $$
(6)

Constraints (2) and (3).

3.3 Robust optimization

Given the conditions and challenges of the census and the problems raised by identifying patients, the impacts of uncertainty in modeling should be considered. One of the issues that medical centers have encountered is the ambiguities in identifying the number of patients.

As the modeling trend shifted from certainty to uncertainty, different methods have been proposed according to the depth of uncertainty. First of all, techniques based on historical data, robust stochastic programming, and by increasing the uncertainty, fuzzy possibilistic programming, and finally, robust convex programming can be found in the literature (Bairamzadeh et al., 2018). Therefore, a robust convex counterpart model is developed in this research with respect to the type of uncertainty.

Among different methods developed to cope with uncertainty, RO approaches are among the most efficient and applicable ways to deal with uncertainty in optimization problems (Golpîra & Tirkolaee, 2019; Lotfi et al., 2020; Tirkolaee, Abbasian, et al., 2021; Tirkolaee, Goli, et al., 2020; Tirkolaee, Hadian, et al., 2020). Bertsimas and Sim (2004) developed an optimization technique based on a set of multidimensional uncertainties. They claimed that rarely all the uncertainty parameters of a constraint take values different from their nominal values. Accordingly, they define the uncertainty for each parameter that has uncertainty, it has nominal value \(\tilde{a}_{ij}\) and takes value in \(a_{ij} \in [\tilde{a}_{ij} - \hat{a}_{ij} ,\tilde{a}_{ij} + \hat{a}_{ij} ]\) that is as follows:

$$ z_{ij} = \frac{{a_{ij} - \tilde{a}_{ij} }}{{\tilde{a}_{ij} }}\, \in \,\left[ { - 1,1} \right], $$
(7)
$$ \sum\limits_{j} {\left| {z_{ij} } \right|} \le \Gamma_{i} \quad\forall i,\Gamma_{i} \in \left[ {0,\left| {J_{i} } \right|} \right]. $$
(8)

where \(J_{i}\) refers to the set of uncertainty parameters in the i-th row in the constraint coefficient matrix. Constraint coefficients have uncertainties such that they take a central or a nominal value \(\tilde{a}_{ij}\) and a radius value \(\hat{a}_{ij}\). Here, the interval is represented by \([\tilde{a}_{ij} - \hat{a}_{ij} ,\tilde{a}_{ij} + \hat{a}_{ij} ]\). Moreover, \(\Gamma_{i}\) represents the level of conservatism and is called budget uncertainty. When \(\Gamma_{i} = 0\), all the uncertainty parameters of the problem take the nominal value of the interval and the robust model results in a deterministic model.

Conversely, the robust model turns into the robust model proposed by Soyster’s model if \(\Gamma_{i} = \left| {J_{i} } \right|\)(Ben-Tal & Nemirovski, 2002). It takes all the uncertainty coefficients in the worst possible case (as the upper limit of the interval). By assigning the value to which is selected from the interval, there is an exchange between robustness and optimization in the problem. In fact, the decision-maker controls the parameter in terms of risk aversion and the importance of the constraint. Now, the RO model proposed by Bertsimas and Sim (2004) is given as follows

$$ {\text{min}}\,\,Z = \sum\limits_{j} {c_{j} x_{j} } $$
(9)

subject to

$$ \sum\limits_{j} {a_{ij} x_{j} } \le b_{i} \quad \forall i,\; \Rightarrow \sum\limits_{j} {\tilde{a}_{ij} x_{j} } + z_{i} \Gamma_{i} + \sum\limits_{{j \in J_{i} }} {r_{ij} } \le b_{i} \quad \forall i, $$
(10)
$$ l_{j} \le x_{j} \le u_{j} \quad \forall j. \quad z_{i} + r_{ij} \ge \hat{a}_{ij} E_{j} \quad \forall i,j, $$
(11)
$$ - E_{j} \le x_{j} \le E_{j} \quad \forall j, $$
(12)
$$ l_{j} \le x_{j} \le u_{j} \quad \forall j. $$
(13)

where \(r_{ij}\), \(z_{i}\) and \(E_{j}\) are the dual variables preventing the model from being non-linearized. The violation probability of ith constraint is denoted by Eq. (14):

$$ \Pr \left(\sum\limits_{j} {a_{ij} x_{j}^{*} } \ge b_{i}\right) \le 1 - \phi \left(\frac{{\Gamma_{i} - 1}}{{\sqrt {\left| {J_{i} } \right|} }}\right) \quad \forall i, $$
(14)

whereas,

$$ \phi (\theta ) = \frac{1}{{\sqrt {2\pi } }}\int\limits_{ - \infty }^{\theta } {\exp \left( - \frac{{y^{2} }}{2}\right)dy} . $$
(15)

Here, \(\phi (\theta )\) follows a normal distribution. Hence, the advantage of Eq. (8) is revealed such that the worth of the budget parameter is not equivalent to different constraints and depends on its importance.

3.4 Robust optimization of the proposed model

Regression models use OR tools for forecasting, but in some situations that a specific constraint is needed to be taken into account in the model, such as convex uncertainty, we cannot employ regression models. Accordingly, RO is recommend as an efficient approach considering the computational accuracy. Now, using the RO framework proposed by Bertsimas and Sim (2004), Model 3 presents the regression-based robust optimization model of the study (Khalilpourazari & Pasandideh, 2021):

Model 3. Robust optimization of the proposed model (Model 2) subject to

$$ {\text{min}}\,MAD = \sum\limits_{i} {\frac{{hp_{i} + hm_{i} }}{\left| I \right|}} $$
$$ y^{\prime}_{i} + hp_{i} - hm_{i} + z_{i} \Gamma_{i} + r_{i0} = \tilde{y}_{i} \quad \forall i, $$
(16)
$$ z_{i} \Gamma_{i} + r_{i0} \ge \rho \tilde{y}_{i} \quad \forall i, $$
(17)
$$ r_{i0} \ge 0,\;\Gamma_{i} = 1 \quad \forall i, $$
(18)

Constraints (2), (3), (6).

3.5 Correlation coefficient of the proposed model

After estimating the parameters of polynomial regression, we need to measure dependency and quality of response. We employed the coefficient of correlation (\(R^{2}\)) and MAD to measure the quality of response. Finally, the correlation coefficient is calculated according to Eq. (19):

$$ R^{2} = \frac{{n\sum\nolimits_{i} {\mathop y\nolimits_{i}^{'} \tilde{y}_{i} } - \sum\nolimits_{i} {y_{i}^{'} } \sum\nolimits_{i} {\tilde{y}_{i} } }}{{\left[ {\left( {n\sum\nolimits_{i} {y_{i}^{'} - \left( {\sum\nolimits_{i} {y_{i}^{'} } )^{2} } \right)} } \right)\left( {n\sum\nolimits_{i} {\tilde{y}_{i}^{2} - \left( {\sum\nolimits_{i} {\tilde{y}_{i} } } \right)^{2} } } \right)^{{1/2}} } \right]}} - 1 \le R^{2} \le 1. $$
(19)

3.6 Well-known methods and compared with the proposed model

To further validation of our proposed model, several well-known forecasting models include Simple Moving Average (SMA), Exponential Moving Average (EMA), Weighted Moving Average (WMA) and Exponential Smoothing with Trend Adjustment (ESTA) are taken into account and explained in the following.

SMA

$$ y^{\prime}_{i} = \frac{{\sum\limits_{i} {\tilde{y}_{i} } }}{n} \quad \forall i $$
(20)

EMA

$$ y^{\prime}_{i + 1} = \alpha (\tilde{y}_{i} - y^{\prime}_{i} ) + y^{\prime}_{i} \quad \forall i $$
(21)

WMA

$$ y^{\prime}_{i} = \frac{{\sum\limits_{i} {w_{i} \tilde{y}_{i} } }}{{\sum\limits_{i} {w_{i} } }} \quad \forall i $$
(22)

ESTA

$$ y^{\prime}_{i} = T_{i} + F_{i} , $$
$$ F_{i} = \alpha (\tilde{y}_{i - 1} - y^{\prime}_{i - 1} ) + y^{\prime}_{i - 1} , $$
$$ T_{i} = \beta (F_{i} - F_{i - 1} ) + T_{i - 1} . \quad \forall i $$
(23)

4 Findings

In this section, the computational results of the research are provided based on the real-world data and parameters which were taken from Worldometers (2020) between February 15, 2020, and April 5, 2020, related in Iran. The parameters are given in Table A in the Appendix. The number of infections (total case) in Iran is shown in Fig. 2. As can be understood from Fig. 2, the number of patients follows an upward trend. In this regard, many strategies and restrictions have been set in Iran to possibly control it such as curfew at night, mandatory use of masks in all spaces and centers and reduction in work time. Various tools and applications have also been utilized to inform the people. Therefore, it was necessary to estimate the patients' statistics and determine the estimation function to predict the status of COVID-19.

Fig. 2
figure 2

Number of infections (Total case) COVID-19 in Iran

CONOPT solver/GAMS and Python software are employed, running on the system with Core™ i5 CPU @ 2.4 GB and 6 GB RAM to solve the proposed Model. Here, 80% of the data was used for training and 20% for testing. It can be concluded that this method can be extended and applied to other countries and data. The output results of Model 3 under different regression coefficients are delineated in Fig. 3. Moreover, we can see forecasting from results and draw with a dotted line in Fig. 3.

Fig. 3
figure 3

Number of infections (Total case) COVID-19 under different regression coefficients

The behavior of MAD is drawn from degrees 1 to 15 in Fig. 4. Accordingly, by increasing the regression coefficient until degree 8, the MAD value decreases and the equation becomes more accurate, and from the 8th coefficient onwards, the MAD value increases (c.f. Table 2 and Fig. 4).

Fig. 4
figure 4

MAD values for regression degrees 1 to 15

Table 2 Output results of the regression-based optimization model based on different coefficients Model 3 \((\rho = 0.00)\)

The coefficients of the prediction function are also estimated in Table 2 and the correlation value \(R^{2}\) for each estimation function is given. The higher correlation with minimum MAD is the superior prediction function of estimating the values. As a result, regression degree 8 is suitable for estimating the number of infections (total case). Moreover, the output results predict the growth in the number of patients, according to Fig. 3. Moreover, the Ordinary Least Squares (OLS) assumptions (Craven & Islam, 2011) are employed to validate and apply Models 2 and 3. According to Table 3, with increasing the level of regression uncertainty, the amount of MAD also decreases, and the equation becomes more accurate (see Table 3 and Fig. 5).

Table 3 Effects of uncertainty in the objective function of the regression-based robust optimization Model (3)
Fig. 5
figure 5

Impacts of studying the uncertainty in the forecasting process

Furthermore, the prediction results of Model 3 are shown in Fig. 6, and considering the level of uncertainty, the future trend is also predicted which yields a lower MAD value. According to Table 4 and comparison made between our models and the other four models. It is revealed that Model 3 has the least MAD and the greatest correlation index than the other models.

Fig. 6
figure 6

Comparisons of the results under uncertain and deterministic conditions

Table 4 Comparison of the proposed model with other models

5 Managerial implications and practical insights

In this study, the dynamics of the COVID-19 outbreak were investigated and predicted using a regression-based robust optimization mathematical model. Five other methods examined the advantages of the developed model and its superiority was finally concluded. In this model, the uncertainty role in the problem was comprehensively addressed considering the dynamics of the virus that can be interpreted in terms of the statistics of confirmed, non-identified, and also patients who have not yet done the tests. On the other hand, nowadays, we are encountering new mutations of the COVID-19 that may lead to a significant rising in the number of patients. Therefore, health managers should consider the issue of uncertainty to provide the required medical equipment to deal with this disease, such as oxygen generators, which are one of the most important pieces of equipment. In other words, the managers need to constantly make accurate predictions of the growing trend of the virus as much as possible. That is why this paper tried to provide a simple and efficient tool to help them by investigating the real conditions of the epidemic in Iran. Although WHO suggested the use of SIR differential models to project the course of COVID-19 pandemic (World Health Organization, 2020), but it was demonstrated that our proposed methodology can be utilized to improve the quality of the predictions in IHME (2021), which is one of the reference sites to project the current and future situation of countries. It may require more complex calculations in some cases. However, the reliability and consideration of uncertainty in the model can efficiently resolve the drawbacks of previous methods.

On the other hand, the outcome of our proposed model can be also utilized by managers to have a perspective on estimating the required resources. To be more specific, based on the upward trend of the pandemic and the growing number of confirmed cases, the shortcoming of oxygen generators in health centers has become a serious problem. Therefore, it is required to increase the capacity as much as possible. In other words, health centers have changed the application of regular and special hospital beds (b) and assigned them to the COVID-19 patients who need more oxygen consumption (\(Q_{oxygen}\)). In Eq. (24), \(v_{b}\) and \(n_{b}\) represent the oxygen consumption coefficient for bed b and the number of bed type b. The beds consist of ICU, CCU, NICU, operation room, endoscopy and regular hospitalization and recovery), respectively. Overall, the total consumption amount of oxygen has increased significantly, which should be regarded in the estimation (Smith, 1995).

$$ Q_{oxygen} = \sum\limits_{b} {n_{b} v_{b} } \;b \in \{ {\text{ICU,CCU,}}\,{\text{NICU,}}\,{\text{Operation,}}\,{\text{Endoscopi,}}\,{\text{Regular,}}\,{\text{Recovery}}\} \; $$
(24)

The above models can be utilized to predict and estimate the requirements so that according to the WHO report, acute patients require 5–15 L per minute (LPM) of oxygen, and for critical patients, 30 LPM of oxygen are needed. In fact, if there is a shortage of equipment, its amount should be estimated exactly by applying standard coefficients.

6 Conclusion and outlook

As it is clear, making the necessary estimates and arrangements to deal with the Coronavirus pandemic is a necessary and basic condition to be regarded by the governments. Therefore, to obtain estimates in future conditions for allocating space, location, equipment and manpower, it is inevitable to model the prevalence and trend. One of the good and effective techniques in this field is to implement a combination of statistical methods and Operation Research (OR) under uncertainty. On the other hand, it was necessary to study the effects of uncertainty in modeling due to the conditions of the census and the challenges in identifying patients. The consideration of the uncertainty positively affected the estimation function of the number of infections and led to the minimization of the estimated risk value. In fact, we tried to estimate and predict the number of patients in Iran using a regression-based robust optimization model. The main advantage is related to the estimation of necessary facilities and equipment in the future to control the pandemic. Based on the literature and the current study, it can be inferred that controlling and reducing the amount of contact between infected and non-infected people through quarantine of susceptible individuals can effectively reduce the incidence of Coronavirus. Combining social distances and tracking people can help reduce the incidence of disease.

The main findings of this study are as follows.

  1. 1.

    By increasing the degree (coefficient) of regression up to 8, the amount of MAD decreased to 1378.123 and the applied equation became more accurate, and from the 8th degree onwards, the amount of MAD increased,

  2. 2.

    Furthermore, by increasing the level of regression uncertainty, the amount of MAD followed a downward trend to reach 1309.28 and the accuracy of the model in estimation increased. The proposed model (Model 3) had the least MAD and the greatest correlation index than the other models,

  3. 3.

    Our proposed methodology provided new insights into the COVID-19 pandemic and can be considered a useful tool for decision-makers.

The main limitation of this study was the inability of the proposed methodology to cope with large dimensions with more data, and due to this, it is suggested to apply big data analytics methods (Ayyıldız et al., 2018; Zdravevski et al., 2020) and heuristic/meta-heuristic algorithms (Alinaghian et al., 2020) for future research opportunities. Furthermore, other uncertainty techniques such as fuzzy programming (Tirkolaee, Abbasian, et al., 2020; Tirkolaee, Goli, et al., 2020), robust scenario-based (Chen et al., 2019; Homayouni et al., 2021; Lotfi et al., 2021a, 2021b; Lotfi et al., 2020; Özmen et al., 2017), optimal control (Savku & Weber, 2018) and artificial neural network (Graczyk-Kucharska et al., 2020) can be implemented and compared with the proposed RO approach. On the other hand, it is also necessary to study recovered patients, deaths, and other statistics to develop an efficient mathematical model.