1. Introduction
The epidemic coronavirus disease (COVID-19) was first detected in December 2019 in Wuhan city, China. Then it started to transmit to countries all over the world till the WHO declared it as a pandemic disease [
1] on 11 March 2020. The new virus has different characteristics than the family it came from (coronaviruses). It has higher infection rates and average incubation period of 5.1 days. The maximum incubation period for the virus is 14 days [
2]. The epidemic infection occurs through physical contacts with individuals or contaminated surfaces. New public safety steps have been associated to mitigate the impact of this pandemic disease like cancellation of public events, assuring online education, closing clubs and other places opened for social gatherings like concerts and sporting events. The total number of confirmed cases worldwide reached 62,844 million on 1 December 2020 with 1465 million deaths [
3]. The main challenge during studying epidemics is how to predict the disease behavior, how many people will be infected in the future, determining the pandemic peak, second wave of the disease time of action, and the total deaths after the pandemic ends. Different researchers from different majors such as applied mathematics, data science, and epidemiology have been working on studying these trends of predictions. Based on this analysis, governments can take proper actions to limit the human and economic losses. Coronavirus not only represents a health crisis but also represents an economic crisis where people are losing their jobs without knowing when normality will return. The International Labor Organization estimates that 25 million people could lose their jobs [
4]. The prediction of the peak date from one country to another and simulation of the variations could depend on the behaviors of people toward social distancing, hygiene measures, temperature, relative humidity, and wind speed.
It is known that the human immune system suffers from depression during the winter season, as the cold, dry air works to dry out the mucus in our noses, which acts as a first line of defense, to prevent viruses. Studies have proven that if the temperature increases, the infected person will spread the COVID-19 to fewer people. Some studies also indicate that the virus cannot survive at 86° Fahrenheit (F) so in this paper, the countries are categorized into three main categories and these will have a reflection on human cells infections. The first category of countries, with low transmission rates, have an average annual temperature between (60 and 100.4) °F. The second category of countries, with medium transmission rates, have an average annual temperature between (37 and 89) °F and finally the third category of countries, with the highest transmission rates, have an average annual temperature between (36.14 and 66.3) °F. The higher the spread rates, the higher the rates of infection of human cells and also their exposure to the virus will be higher, and thus the possibility of human cells to sustain the virus will depend on the location of the country and also in which category it belongs.
There are a few published papers, studying the analysis of COVID-19. In [
5], authors used numerical approaches and logistic modelling technique to make a complete analysis for COVID-19. In [
6], authors made a fractional-order-modified SEIR model of pandemic diseases and applied it on COVID-19 to predict the virus spreading behavior in Pakistan and Malaysia. In [
7], authors made predictions about coronavirus transmission dynamics in the African countries. The model parameters (protection rate, infection rate, average incubation time, average quarantined time, cure rate, and mortality rate) are selected using Metropolis-Hasting (MH) parameter optimization method [
8]. Authors modelled COVID-19 daily confirmed cases in Egypt and Iraq by using Gaussian fitting model and logistic model [
9,
10,
11]. The virus dynamic behavior is modelled with a new SEIR model. One of the important quantities should be calculated during modelling virus dynamics is the basic reproduction number. This value helps eliminate the disease and expected number of secondary infections produced by an infected individual in a population when all individuals are susceptible to infection. In [
12], authors made a multi-strain-modified SEIR epidemic model for COVID-19 and wrote a complete analysis on how to control the value of the reproduction number to control the next upcoming virus peaks.
In this paper as a way to predict the upcoming coronavirus waves, a new state-of-art of regression models are used to model daily confirmed cases in different countries. In this study, countries are classified based on the time of the full viral wave and the average annual temperatures, where if time of the viral wave is lower, the virus transmission rate is higher. Fourier model and sum of sine-waves model are used to fit the daily confirmed data and predict the upcoming wave peaks for these countries. In this paper, the actual data from [
3,
13] are used for generating the predictive regression models without making any statistical modifications on them. The mathematical regression models fit the data from 1 March 2020 (day 0) to 15 November 2020 (day 260) and hence, the models can predict different scenarios for each country in the period from 16 November 2020 (day 261) to 10 April 2021 (day 400). The real smoothed data from [
14] was used to verify the accuracy of the prediction models. All used datasets used in this paper have been attached with a separate supplementary material file. The countries under study are categorized into three main categories:
- -
First category: countries in which coronavirus wave takes low transmission about two-year seasons (about 180 days) to make a complete virus cycle: (case study of Saudi Arabia and Egypt).
- -
Second category: countries with higher transmission rates with one-year season (about 90 days) to make complete virus cycle. These countries take offline periods with low spreading rate before entering the next wave cycle: (case study of United Kingdom, Germany and Italy).
- -
The third category: countries with the highest transmission rates with one-year season (about 90 days) to make complete cycle and without offline periods (consecutive waves): (case study of United States of America and Russia).
The remainder of this paper is organized as follows:
Section 2 describes the mathematical background of the proposed predictive modelling.
Section 3 elaborates the used optimization algorithm to fit the data.
Section 4 includes the results of applying the predictive models on the available data of the three category countries. In the end the conclusions are given.
2. Predictive Mathematical Modelling
Mathematical curve fitting is a key for getting a mathematical relation between measured values and their dependent input parameters named as regression models [
15] like Polynomial Model, Exponential Model, Power Model, and Fourier Model [
16,
17,
18]. Choosing the suitable model to represent coronavirus data can be measured by the mean of determination coefficient (
) which takes values from 0 to 1. Higher value of
means higher model accuracy [
18].
can be calculated from Equation (1) with N data points.
where
is the average value of measured data,
is the measured data value at time
and
is the corresponding value using the fitting model equation.
The used mathematical curve fitting techniques here for modeling daily confirmed cases of COVID-19 and predict upcoming scenarios are Fourier fitting models and sum of sine-waves fitting models.
2.1. Fourier Fitting Models
Fourier models are used to model periodic functions with three main parts which are constant term, cosine-wave terms and sine-wave terms. As coronavirus is spreading like a wave and has a possibility of repetitions so this leads to near periodic wave assumptions. So, modelling using Fourier fitting model here is a good choice of fitting. Fourier fitting model equation with n-terms is presented in Equation (2) [
19] and w is the fundamental frequency of sine and cosine terms.
(
) are unknown coefficients and selected using numerical optimization techniques to satisfy the least square regression criteria between data and fitting model.
2.2. Sum of Sine-Waves Fitting Models
Sine-waves are always used to model periodic functions. Using the sum of sine waves with different frequencies can be used to model near periodic functions. These models are used here to fit data of coronavirus daily confirmed cases in the countries under study. Also modelling by this kind of mathematical equations can result in predicting next upcoming coronavirus waves in the countries under study. Equation (3) represents the sum of sine-waves equation with n-terms.
(
) are unknown coefficients and optimized using numerical optimization techniques to minimize the error between model and measured daily confirmed cases.
The used optimization algorithm for optimal selections of each model coefficients is invasive weed optimization algorithm (IWO) [
20]. The optimization process leads to increasing model accuracy and decreasing root mean square of errors (RMSE) between actual number (N) of data points and the fitting model. The objective function will be minimized using the optimization algorithm when modelling data by Fourier model with n-terms is as indicated in Equation (4). Similarly, the objective function used when modelling with the sum of sine-waves is as indicated in Equation (5).
Subject to:
Subject to:
5. Conclusions
In this article, both Fourier models and sums of sine-waves models are used to predict the upcoming coronavirus peaks in the countries under study. In the first category countries (Egypt and Saudi Arabia), the models used gave different scenarios for each country in a form of three different scenarios with different wave peak and action time. From obtained results, Egypt and Saudi Arabia will be only exposed to a second wave till 10 April 2021. In case of second category countries (United Kingdom, Italy, and Germany), the models used gave different three scenarios of upcoming coronavirus wave peak. Most of the used models predict that these countries will suffer from two consecutive wave peaks and will suffer a third virus wave before 10 April 2021. In these countries, the spread of the virus will be controlled because the time for the second wave is limited and the daily confirmed cases decrease after the second wave reaches its peak. In the third category countries (The United States and Russia), the used models expect that these countries will reach the peak of the third wave of coronavirus before 10 April 2021, and these countries will suffer from consecutive increasing peaks. Finally, all predictive models for countries under study are compared to their current smoothed current data of daily confirmed cases to check their prediction accuracy. In the case of Egypt, the only country without vaccination effect, predictive models give very near curve shapes to the actual smoothed current data. For the remaining countries, with different values of actual vaccinated people per cumulative cases, the predictive models are helpful tools of forecasting virus behavior till day 307 (1 January 2021). After that date, the vaccination effect starts to limit the virus transmission rate and the next wave is damped in these countries. In our future work, we will make developments on the current predictive models considering how vaccination affects the virus spread rate.