1 INTRODUCTION

Today, all over the world they are working to create mechanisms for detecting the spread of COVID-19 and its elimination. Predicting the spread of the disease can help solve this serious problem. Observation and analysis of the spread of coronavirus make it possible to assert that humanity is faced with a synchronized process. The data collected and used to develop forecasts is often a time series, that is, it describes the evolution of a process over time. Therefore, to predict the process, it is possible to apply well-known forecasting methods with preliminary analysis and data processing, as well as using neural network technologies.

The goal of any forecast is to create a model that allows you to study the future and assess the trends of a factor. The quality of the forecast in this case depends on the presence of the background variable factor, the measurement error of the considered value, and other factors. Formally, the forecasting problem is formulated as follows: find a function f that allows us to estimate the value of the variable x at time (t + d) by its N previous values, so that:

$$x(t + d) = f(x(t),x(t - 1)...,x(t - N + 1)).$$

Usually, d is assumed to be equal to one, i.e., the function f predicts the next value of x.

It is already clear that the coronavirus pandemic has affected the economies of all countries of the world. On the one hand, there is a need to solve the problems associated with reducing the consumption of almost all resources that form the basis of the country’s export potential. On the other hand, it is necessary to solve the problem of stimulating the production and consumption of goods and services in the country. In this situation, it is important to obtain predicted values of the COVID-19 coronavirus infection process for specific dates.

When analyzing data, forecasting can predict some unknown quantity from a set of related values. Hence, forecasting is performed using data mining tasks such as regression, classification, and clustering.

Predicting the spread of coronavirus is essential in developing protective measures and behavioral measures for the population. The problem with modeling such a system is that every day COVID-19 and the number of new potential cases cannot be determined in a simple mathematical equation. There are many reasons for such problems. The spread of human filaments generally depends on various features, depending on both human behavior and the coronavirus’s biological structure. In any case, research needs to be done to biologically describe the coronavirus to develop a medical treatment and model the spread that will help prevent new cases and focus on the places with the greatest potential needs. According to [1], predicting the spread of coronavirus is very important for operational action planning. Unfortunately, coronaviruses are not easy to control, as the speed and reach of their spread depend on many factors, from environmental to social. In [1], the research results on developing a neural network model for predicting the spread of COVID-19 are presented. The prediction process itself is based on the classical approach of training a neural network with a deep architecture using the NAdam training model. For training, the authors of the article used official data from the government and open repositories.

In [2], deep learning was used to identify and diagnose patients with COVID-19 using X-ray images of the lungs. The authors presented two algorithms to diagnose the disease: the deep neural network (DNN) on the fractal feature of images and neural network (SNN) methods using lung images directly. The results show that the presented methods allow detecting infected areas of the lungs with high accuracy—83.84%.

Several works are devoted to COVID-19 disease detection using neural networks. The authors of [35] propose a method based on a convolutional neural network (CNN) developed using the EfficientNet architecture for automated COVID-19 diagnostics. The architecture of a computerized medical diagnostics system is also proposed to support healthcare professionals in the decision-making process to diagnose diseases. Several important models have been introduced in recent months. In [6], machine learning was applied to evaluate how this stream’s flash will take place. However, predicting the situation in the case of COVID-19 is not easy since many factors determine rapid changes [7]. Therefore, many approaches have been used to help. In [8], the flow prediction was performed using a mathematical model that evaluated undetected Chinese infections. Sometimes even elementary techniques are used. When a solution is needed immediately, we can start predicting based on preprocessing, in which some cases are simply removed for the applied model on the Euclidean network [9]. In Japan, prognostic models also evaluated the first symptoms of the disease [10]. One of Italy’s first models was the use of the Gauss error function and Monte Carlo simulation on registered cases [11]. Stochastic predictors also provide potential help in the early days when not much data is available for machine learning approaches [12]. Such stochastic models also seem to work even for huge societies, such as India [13]. Therefore, when artificial intelligence is applied in the first days of forecast periods, the results are mostly related to a single region or country. One of the first approaches for China was presented in [14]. An interesting discussion of the principles of using mathematical modeling was presented in [15]. Some methodologies predict the number of new cases and make some assumptions about growth dynamics [16]. There are many sources of information for predicting the situation. As reported in [17], social networks can bring valuable information about confirmed cases of the disease and further spread. The relationship between new cases and the rate or coverage of growth can be transformed into a prediction elsewhere, as shown in [18]. This transfer of knowledge to model another region was carried out between Italy and Hunan province in China. The case of the ship “Diamond Princess” was discussed in [19]. Some models assess the situation in larger regions or more than one country. In [20] and [21], and applied forecasting model was defined for working with data from China, Italy, and France. Some models only consider the total number of cases worldwide as a whole [22].The model proposed in [23] is a complex solution. The proposed neural network architecture was developed to forecast new cases in various countries and regions. The architecture consists of seven layers, and the output predicts the number of new cases. In [24], a shallow long short-term memory (LSTM) based neural network was used to predict the risk category by country. The results show that the proposed pipeline outperforms state-of-the-art methods for data of 180 countries and can be a useful tool for such risk categorization. In [25], a combination of the LSTM-SAE network model, clustering of the world’s regions, and Modified Auto-Encoder networks were used to predict future COVID-19 cases for Brazilian states. A comprehensive review of artificial intelligence and nature-Inspired computing models is presented in [26]. To predict time-dependent processes, one can, among other things, use an adaptive network based on the fuzzy inference system (adaptive neuro-fuzzy inference system) ANFIS—an artificial neural network based on a fuzzy inference system that was developed in the early 1990s [27]. ANFIS integrates the principles of neural networks with the principles of fuzzy logic. To use ANFIS most efficiently and optimally, some authors recommend using the parameters obtained using the genetic algorithm [28].

2 METHODS

The solution of the forecasting problem using a trained neural network presupposes, first of all, the availability of statistical data on the spread of this disease by day, provided by the Federal Service for Surveillance on Consumer Rights Protection and Human Well-being (https://yandex.ru/covid19/stat?utm_source= main_notif&geoId=1) for the world as a whole (Table 1, see Appendix A).

The statistical data obtained in the form of a time series require significant processing to form a training sample of the neural network and obtain the necessary data for the operation of the neural network dataset. This process usually includes the following steps:

— Time-series adjustment—smoothing and removing anomalies.

— Study of the time series, highlighting its components (trend, seasonality, cyclicity, noise)—autocorrelation analysis.

— Data processing using the sliding window method.

— Data processing using a multi-layer neural network, neural network training.

— Selecting the appropriate forecasting method.

— Assessment of the accuracy of forecasting and the adequacy of the chosen forecasting method.

The analysis of the above points and numerous experiments allowed us to propose a General scheme for analytical processing of statistical source data to obtain a dataset for a neural network with subsequent neural network training and forecasting training. The block diagram of the dataset generation algorithm for the neural network and predicting COVID-19 coronavirus infection cases is shown in Fig. 1.

Fig. 1.
figure 1

Block diagram of the dataset generation algorithm for the neural network and prediction of cases of COVID-19 coronavirus infection.

2.1 Adjustment of Time Series

The graph of identified cases of COVID-19 coronavirus infection in the world as of October 23, 2020, is shown in Fig. 1. To obtain a forecast on the required scale, it is necessary to change the time scale of the data series. optimize it for further processing. If you send data by day to a predictive model (neural network, linear model), then the forecast will be by day. If you have previously converted the data to weekly intervals, then the forecast will be based on weeks. If necessary, the date can be converted to a number or a string for further processing.

In our case, we proceeded from the need to obtain a forecast by days, therefore, having performed the necessary transformations of the initial data into the “date” scale: Year + Day, we will receive the corresponding graph of the initial data in the indicated scale (Fig. 2):

Fig. 2.
figure 2

Schedule of detected cases of COVID-19 coronavirus infection in the world by dates as of March 27, 2021, in the “date: year + day” scale.

2.2 Smoothing and Removal of Anomalies: Spectral Data Processing

The purpose of spectral processing is to smooth ordered data sets using a wavelet or Fourier transform. The principle of such processing is to decompose the original time series function into basic functions. It is most often used for preliminary data preparation in forecasting tasks.

At the “Spectral Processing” step of the processing wizard, the “Wavelet Transform” method was selected, and the decomposition depth and order of the wavelet were set. The depth of decomposition determines the “scale” of the parts be filtered out: the larger this value, the «larger» parts in the source data will be discarded. If the parameter values are large enough (about 7–9), the data is not only cleared of noise but also smoothed (sharp outliers are “cut off”). Using too many decomposition depth values can lead to a loss of useful information due to too much “coarsening” of the data. The wavelet’s order determines the smoothness of the reconstructed data series: the lower the parameter value, the more pronounced the “outliers” will be, and, conversely, if the parameter values are large, the “outliers” will be smoothed. Figure 3 shows a plot of smoothing and removal of anomalies using spectral processing using the “Wavelet transform” method and setting the average values of the parameters of this method.

Fig. 3.
figure 3

Graph of detected cases of COVID-19 coronavirus infection in the world, smoothed using spectral processing using the “Wavelet transform” method.

2.3 Autocorrelation Analysis of Data

The purpose of autocorrelation analysis is to determine the degree of statistical dependence between different values (counts) of a random sequence formed by the data sample field. In the process of autocorrelation analysis, correlation coefficients (a measure of mutual dependence) are calculated for two sample values that are separated by a certain number of samples, also called lag. The set of correlation coefficients for all lags is an autocorrelation function of the series (ACF):

$$R(t) = {\text{corr(}}X(t),X(t + k){\text{)}},$$

where k > 0 is an integer (lag).

The ACF behavior can be used to judge the nature of the analyzed sequence, i.e. the degree of its smoothness and the presence of periodicity (for example, seasonal) or a trend.

For k = 0, the autocorrelation function will be maximal and equal to 1 as the number of lags increases, i.e. the distance between two values for which the correlation coefficient is calculated increases, the ACF value will decrease due to a decrease in the statistical interdependence between these values (the probability of occurrence of one of them less affects the probability of occurrence of the other). At the same time, the faster the ACF decreases, the faster the analyzed sequence changes. Conversely, if the ACF falls slowly, then the corresponding process is relatively smooth. If there is a trend in the original sample (a smooth increase or decrease in the series), then a smooth change in the ACF will also occur. If there are seasonal fluctuations in the original data set, the ACF will also have periodic spikes. Figure 4 shows a graph of the autocorrelation function of detected COVID-19 cases in the world. Using this graph, you can visually determine the trend on the curve with lags of 290.

Fig. 4.
figure 4

A graph of autocorrelation functions detected cases of coronavirus infection COVID-19 in the world.

2.4 Data Processing by a Sliding Window

Data processing using the sliding window method is used for preprocessing data in forecasting tasks when the values of several neighboring samples of the original dataset must be fed to the input of the neural network. The term “sliding window” reflects the essence of processing – a specific continuous piece of data is selected, called a window. The window, in turn, moves, “slides” over the entire set of initial data. This operation results in a selection, in which each record contains a field corresponding to the current selection (it will have the same name as in the original selection), and to the left and right of it there are fields containing selections shifted from the current selection to the past and future accordingly.

The sliding window processing has two parameters: immersion depth—the number of samples in the “past” and the forecast horizon—the number of samples in the “future”. The article used a sliding window method to smooth the plots of detected COVID-19 cases in a world with a depth of 282 immersion using spectral processing. The forecast horizon was taken equal to one. The result was a dataset for training a neural network. The Deductor Studio analytical neural network platform was chosen to predict coronavirus in the Russian Federation and Moscow under the current conditions (www.basegroup.ru). Deductor Studio is the analytical core of the Deductor platform. Deductor Studio contains a complete set of data import, processing, visualization, and export mechanisms for fast and efficient information analysis. It focuses on state-of-the-art methods for extracting, cleaning, manipulating, and visualizing data. With it, you can use modeling, forecasting, clustering, pattern search, and many other technologies for Knowledge Discovery in Databases and Data Mining. Deductor Studio includes a full set of mechanisms that allow you to get information from any data source, perform the entire processing cycle (cleaning, transforming data, building models), display the results most conveniently (OLAP, tables, charts, decision trees…) and export the results (https://docplayer.ru/49814110-Naznachenie-i-osnovnye-vozmozhnosti-analiticheskoy-platformy-deductor-studio.html).

Data processing is performed using a multi-layer neural network. In this mode, the “Processing Wizard” of the Deductor analytical platform allows you to construct a neural network with a given structure, determine its parameters and train it is using one of the training algorithms available in the system. The result will be a neural network emulator that can be used to solve problems of forecasting, classification, finding hidden patterns, data compression, and many other applications [29].

Configuring and training a neural network consists of the following steps:

— setting up field assignments,

— adjust the normalization of the fields,

— setting up a training sample,

— configuring the structure of the neural network,

— selecting the algorithm and configuring the training parameters,

— setting the conditions for stopping training,

— starting the training process,

— selecting the data display method.

When configuring the neural network, in the “Neurons in layers” section, you must specify the number of hidden layers, i.e., the layers of the neural network located between the input and output layers. The number of neurons in the input and output layers is automatically set according to the number of input and output fields of the training sample, and it cannot be changed here.

The choice of the number of hidden layers and the number of neurons for each hidden layer should be approached carefully. It is believed that a problem of any complexity can be solved using a two-layer neural network [29]. When choosing the number of neurons, the following rule should be followed: “the number of connections between neurons should be about an order of magnitude less than the number of examples in the training set”. The number of connections is calculated as the connection of each neuron with all the neurons of the neighboring layers, including the connections on the input and output layers. Too many neurons can lead to the so-called “overfitting” of the network when it produces good results on the examples included in the training sample but practically does not work on other examples.

In the “Activation function” section, you need to determine the type of neuron activation function and its steepness. To do this, in the “Function type” list, select the desired activation function, and in the “Steepness” field, set its steepness.

Neural networks differ from traditional statistical methods but may have some similarities with them. For example, a traditional linear regression model can acquire knowledge through the least-squares method and express this knowledge in regression coefficients. In this sense, the regression model can be considered as a neural network. Then we can say that linear regression is a special case of neural networks of a certain type. However, linear regression has several assumptions that are imposed before the information is extracted from the data—the hypothesis of a preliminary determination of the relationship between the dependent and independent variables is put forward. Instead, in neural networks, the shape of the relationship is determined during the learning process.

In this mode, the processing wizard allows you to define the structure of the neural network, determine its parameters and train it is using one of the algorithms available in the system.

Configuring and training a neural network consists of the following steps:

(1) Configure field assignments. Here you need to determine how the fields of the source data set will be used when training the neural network and working with it in practice.

(2) Setting the normalization fields. The goal of normalizing field values is to transform data to the form that is most suitable for processing using a neural network.

(3) Setting the training sample. Here you need to divide the training sample for building a model based on a neural network into two sets – training and test. The training set-includes the records that will be used as input data, as well as the corresponding desired output values.

The test set includes records that contain input and desired output values but are used to test the results of the model, rather than to train it.

(4) Adjust the structure of the neural network. At this stage, parameters are set that determine the structure of the neural network, such as the number of hidden layers and neurons in them, as well as the activation function of neurons. In the “neurons in layers” section, you need to set the number of hidden layers, that is, the layers of the neural network located between the input and output layers.

(5) The choice of algorithm and parameters training. At this step, we select the neural network training algorithm and set its parameters.

(6) Setting the conditions for stopping training. At this step, we set the conditions under which training will be terminated: the condition that the discrepancy between the reference and real network output becomes less than the specified value set the number of epochs (training cycles) after which training stops, regardless of the error value.

(7) Starting the learning process. At this step, we start the actual process of training the neural network.

(8) Choosing a way to display data. At this step, we choose how the imported data will be presented. In our case, the following specialized visualizers are of interest:

(8.1) Conjugacy table, scattering diagrams. The choice of an appropriate forecasting method is to determine whether this method produces satisfactory forecast errors. In addition to calculating errors, their comparison is carried out in a special Visualizer—the “scatter diagram” (Fig. 5). The scatter plot shows the output values for the training sample set (dataset) for the entire world.

Fig. 5.
figure 5

Scattering diagram of a trained neural network for the whole world dataset.

The X-axis is the output value of the training sample (reference), and the Y-axis is the output value calculated by the trained model using the same example. A straight diagonal line is a reference point (a line of ideal values). The closer the point is to this line, the smaller the model error.

The scatter plot allowed us to compare several models to determine which model provides the best accuracy on the training set.

(8.2) Diagram. The graph displays the dependence of the values of one field on another. The most used chart type is a 2D graph. Its horizontal axis is the independent column values, and the vertical axis is the corresponding dependent column values.

After building a model for assessing the quality of training, we present the obtained data in the form of a diagram for the current and reference values of the set of the whole world (Fig. 6).

Fig. 6.
figure 6

Schematic of a trained neural network for a dataset the whole world.

An analysis of the scatterplot (Fig. 5) and the trained neural network diagram for the entire world dataset (Fig. 6) suggests that the neural network has been successfully trained.

3 RESULTS

Forecasting allows you to get a prediction of the values of a time series for the number of samples corresponding to the specified forecast horizon.

What is the maximum forecast horizon? The following rule is recommended: the amount of statistical data should be 10–15 times greater than the forecast horizon. This means that in our case, the maximum forecast horizon can be 30 days.

When performing the actual forecast, we pre-configure several fields: forecast horizon (set 20 days), request the “forecast step” and “source data” fields, and set color and scale parameters. Adding the “forecast step” field (check the box) allows you to add the “forecast Step” field to the resulting selection, which will indicate the number of the forecast step that resulted in it for each record.

“Source data”—selecting this check box allows you to include in the resulting selection not only those records that contain the predicted values, but also all those that contain the source data. In this case, the records containing the forecast will be located at the end of the resulting selection.

The final graph for predicting the number of COVID-19 infections by date using neural technologies is shown in Fig. 7 (worldwide). The proposed model for predicting the number of COVID-19 infections by date using neural technologies, built once, cannot “work” indefinitely. There are new data on the number of infections in the world. Therefore, the model should be periodically reviewed and retrained.

Fig. 7.
figure 7

Graph for predicting the number of COVID-19 infections by date using neural technologies (worldwide).

4 CONCLUSIONS

In this paper, we solve the problem of predicting COVID-19 diseases in the world using neural networks. This approach is useful when it is necessary to overcome difficulties related to non-stationarity, incompleteness, unknown distribution of data, or when statistical methods are not satisfactory. The forecasting problem is solved using the analytical platform Deductor Studio, developed by BaseGroup Labs (www.basegroup.ru, Russian Federation, Ryazan). When solving this problem, we used mechanisms for cleaning data from noise and anomalies, ensuring the quality of building a predictive model and obtaining forecast values for tens of days ahead. The principle of time series forecasting was also demonstrated: import, seasonal detection, cleaning, smoothing, building a predictive model, and predicting COVID-19 diseases in the world using neural technologies for thirty days.