1 Introduction

In recent months, we have observed the behavior of the latest global pandemic, the COVID-19 virus, and how it has affected countries worldwide with different consequences. There are countries with a high rate of confirmed and death cases, such as China, Brazil, and the USA, as well as countries that managed to keep their numbers low of confirmed and death cases (HDX 2020). The COVID-19 virus has motivated numerous investigations related to finding risk factors, symptoms, treatments, predictions, and sequels. In Zhang et al. (2020), the authors describe the characteristics of COVID-19 patients with type-2 diabetes and analyze the risk factors for severity. For their analysis, they collected information about demographics, symptoms, treatments, and outcomes of COVID-19 patients with diabetes. They concluded that patients with type-2 diabetes patients are more susceptible to COVID-19. In Sakalli et al. (2020), the authors determine the frequency and severity of symptoms, especially smell and taste loss of sense in COVID-19 disease, where patients with a positive COVID-19 diagnosis were questioned about general information such as age, sex, date of symptoms, and smoking history. Also, the patients were questioned about the most apparent symptoms. They conclude that smell and taste loss of sense are symptoms related to COVID-19. In Jin et al. (2020), the authors analyzed the clinical use and efficacy of clinically approved drugs. They analyzed drug development progress for the treatment against COVID-19 in China, intending to provide information on the epidemic control in other countries. Regarding prediction, recent works have addressed prediction about a specific country or in the prediction of confirmed COVID-19 cases worldwide. In Torrealba-Rodriguez et al. (2020), the authors presented the modeling and prediction of confirmed cases of COVID-19 in Mexico, proposing mathematical and computational models. They proposed the Gompertz, logistic, and inverse artificial neural network model to predict information of the next eight days (from May 9 to 16). In Salgotra et al. (2020), the times series forecast of the COVID-19 is analyzed for India country using genetic programming. In their work, they analyze the COVID-19 information about confirmed and death cases for the whole country and the most states affected by the pandemic: Maharashtra, Gujarat, and Delhi. To perform this analysis, they applied gene expression programming (GEP) to generate reliable models to perform prediction for the next 10 days. In Shastri et al. (2020), the authors proposed deep learning models to analyze Covid-19 cases in India and the USA, using recurrent neural networks. According to their results, the confirmed and death cases for both countries will rise in the next 30 days. In Kırbas et al. (2020), confirmed COVID-19 cases of Denmark, Belgium, Germany, France, the UK, Finland, Switzerland, and Turkey are modeled with autotegressive integrated moving average (ARIMA), nonlinear autoregression neural network (NARNN), and long short-term memory (LSTM) approach. They conclude that their model of LSTM provides a better prediction in the next 14 days. In previous works, we applied intelligence techniques such as ensemble neural networks (ENN), fuzzy logic (FL), and self-organizing maps (SOM) to analyze COVID-19 information. In Melin et al. (2020), an analysis of coronavirus pandemic evolution by self-organizing maps (a type of unsupervised neural network) is performed. The achieved results allowed that the countries were grouped depending on their rate of confirmed, recovered, and death cases. These kinds of results allow making decisions about strategies for pandemic control around the world. In Melin et al. (2020), we applied ensemble neural networks to predict COVID-19 confirmed and death cases of 12 states in Mexico. For each state, the ensemble neural networks are formed with three neural networks, and to the combination of the responses, a type-1 fuzzy inference system is used to apply weighted average integration. The achieved results were compared with the individual performance of each neural network. In most results, the proposed integration achieved better results than conventional monolithic neural networks predicting information of 10 future days. However, we also aim to propose a general method to apply it to other countries. An essential part of developing a method applicable to other countries is to find optimal architectures of ensemble neural networks. These architectures will allow predicting according to the cases of each country, i.e., there are countries whose cases are on a constant increase and others that have days when the number of cases unexpectedly shoots up. Hence, it is crucial to find an optimal architecture for the behavior of each country. For this reason, it was decided to use an optimization technique. In this work, a firefly algorithm is proposed because we have already applied this optimization in pattern recognition in previous work, specifically in human recognition using biometric measures (Sánchez et al. 2017). This optimization technique provided better neural network architectures against other optimization techniques, such as the genetic algorithm (GA) (Goldberg 1989; Sánchez and Melin 2014), gray wolf optimizer (GWO) (Mirjalili et al. 2014; Sánchez et al. 2017), and particle swarm optimization (PSO) (Eberhart and Kennedy 1995; Eberhart and Shi 2000; Sánchez et al. 2020) when the number of data for the training phase of the neural networks is decreased. In this work, the number of neural networks that form the ensemble neural network and their architecture in parameters, such as the number of hidden layers, neurons, and goal error, is optimized. We proposed a type-2 fuzzy integration to increase the performance between other integration techniques, such as the conventional average and the type-1 fuzzy weighted average. The optimization of ensemble neural network architectures with a firefly algorithm is proposed to improve the results of conventional monolithic neural networks and try to correctly predict more days than previous works. The proposed method proved its effectiveness by comparing its results of confirmed and death COVID-19 cases of 26 countries: Austria, Belgium, Bolivia, Brazil, China, Ecuador, Finland, France, Germany, Greece, India, Iran, Italy, Mexico, Morocco, New Zealand, Norway, Poland, Russia, Singapore, Spain, Sweden, Switzerland, Turkey, UK, and the USA. The main contribution of the proposed method is the optimization of the ensemble neural network architecture and the combination of responses using a type-2 fuzzy inference system to assign a weight to each prediction and in this way be able achieve efficient prediction of 20 future days (from 06/28/2020 to 07/17/2020).

This paper is organized as follows. The intelligence techniques applied in this work are briefly described in Sect. 2. In Sect. 3, the proposed method is described. In Sect. 4, the achieved experimental results are presented and explained. The statistical comparisons of results are presented in Sect. 5. The conclusions are finally given in Sect. 6.

2 Intelligence techniques

In this section, a brief description of the techniques applied in the proposed method is presented

2.1 Ensemble neural network

An artificial neural network is a popular intelligent technique that simulates the abilities of a human brain, such as its learning capability, and to generalize information. Its cells are emulated with units (known as neurons) interconnected, which manages weights. These weights store knowledge during the learning process (Aggarwal 2018). Figure 1 shows an artificial neuron j with inputs (x1, x2,…,xn) and weight associated (w1, w2,…wn) called synaptic weights.

Fig. 1
figure 1

Artificial neuronal network

The synaptic weights are added together as:

$$ y_{j} = \sum \limits_{i = 1}^{n} w_{i} x_{i} $$
(1)

This summation is the activation of the neuron j. The output of the neuron j is finally computed by an activation function being this output, the input of another neuron (except in the output layers). When in ANNs, the activation function is nonlinear (for example, hyperbolic tangent or sigmoid). This allows having better learning in complex patterns and nonlinearity behaviors. A conventional artificial neural network has three kinds of layers: input, hidden, and output layer, where each layer contains neurons interconnected among layers. The input layer transmits the input information; meanwhile, it can have one or several hidden layers that send information to the output layer, which produces a final result (Gurney 1997; Haykin 1998). In Fig. 2, an example of an artificial neural network is shown. The neurons of the input and hidden layer are connected to all neurons in the next layer. The information is propagated through the network up to the output layer.

Fig. 2
figure 2

Artificial neural network

An ensemble neural network is composed of various monolithic artificial neural networks (also known as modules). All the artificial neural networks are trained for the same task (Hansen and Salomon 1990; Soto et al. 2015), becoming each neural network an expert of the same problem, where each one provides an answer; these answers can differ, in this work; for example, each artificial neural network provides a different prediction; even each one had learned the same information. For this reason, to obtain a final answer or decision, each answer is combined with the other answers using a unit integration (Pulido and Melin 2014; Pulido et al. 2014). Figure 3 shows a representation of an ensemble neural network. We used this kind of neural network because it has been an excellent tool for time series prediction (Pulido and Melin 2014; Soto et al. 2015), each neural network gives us a prediction, and through an integration method, a final prediction is obtained.

Fig. 3
figure 3

Example of an ensemble neural network

2.2 Type-2 fuzzy logic

Fuzzy logic is an intelligent technique successfully used to model complex systems and derive useful fuzzy relations or rules proposed by L.A. Zadeh in 1965 (Zadeh 1965; Zadeh 1998). In Boolean logic, an element belongs absolutely to a set (1) or not (0). In type-1 fuzzy logic, the element can partially belong with a membership grade represented with a crisp number in [0,1]. An example of a type-1 membership function is shown in Fig. 4.

Fig. 4
figure 4

Membership function of a type-1 fuzzy set

A type-1 fuzzy set A is characterized by a type-1 membership function \( \mu_{\ A} \left( x \right) \), where \( x \in X \) in a universe of discourse X (Castro et al. 2007). It can be represented as a set of ordered pairs of elements x, and its membership value is given as:

$$ A = \left\{ {\left( {x,\mu_{\ A} \left( x \right)} \right) |\forall_{x} \in X} \right\} $$
(2)

L.A. Zadeh also proposes the concept of a type-2 fuzzy set in 1975 (Zadeh 1975). The membership of an element is defined with a fuzzy membership function, i.e., the membership grade for each element of the set is a fuzzy set in [0, 1]. This type of fuzzy logic is recommended for application in situations where it is complicated to assign a crisp number in [0,1] as in type-1 fuzzy logic (Al-Jamimi and Saleh 2019; Melin and Castillo 2005). A type-2 fuzzy set à can be defined as:

$$ {\text{\ A}} = \left\{ {\left( {\left( {x,u} \right), \mu_{\ A} \left( {x,u} \right)} \right) | \forall_{x} \in X, \forall_{u} \in J_{x} \subseteq \left[ {0,1} \right], \mu_{\ A} \left( {x, u} \right) \in \left[ {0,1} \right]} \right\} $$
(3)

where the domain of the fuzzy variable is denoted by X. The primary membership of x is denoted by \( J_{x} \subseteq \left[ {0,1} \right] \), and the secondary membership is a type-1 fuzzy set denoted by \( \mu_{\ A} \left( {x,u} \right) \). The uncertainty is represented by a region known as the footprint of uncertainty (FOU). There is an interval type-2 membership function if \( \mu_{\ A} \left( {x,u} \right) \) = 1, \( \forall_{u} \in J_{x} \subseteq \left[ {0,1} \right] \) as Fig. 5 shows with a uniform shading for the footprint of uncertainty (FOU) with its upper \( \bar{\mu }_{\ A} \left( x \right) \) and lower \( \underline{\mu }_{\ A} (x) \) membership function (Melin and Castillo 2014; Mittal et al. 2020). An interval type-2 fuzzy set can be defined as:

Fig. 5
figure 5

Membership function of an interval type-2 fuzzy set

$$ {\text{\ A}} = \left\{ {\left( {\left( {x,u} \right),1} \right) |\forall x \in X, \forall_{u} \in J_{x} \subseteq \left[ {0,1} \right]} \right\} $$
(4)

The union of all the primary memberships Jx contained in the FOU can be defined as:

$$ FOU\left( \ A \right) = \mathop {\bigcup }\limits_{x \in X} J_{x} $$
(5)

The \( FOU\left( \ A \right) \) is delimited by the upper membership function (UMF) and the lower membership function (LMF) defined as:

$$ \bar{\mu }_{\ A} \left( x \right) = \overline{FOU \left( \ A \right)} $$
(6)
$$ \underline{\mu }_{\ A} (x) = \underline{FOU(\ A)} $$
(7)

A basic structure of a type-2 fuzzy inference system (T2FIS) has the components shown in Fig. 6. These components are: (a) fuzzifier: in this process, the crisp input values are converted to fuzzy values, (b) inference: fuzzy reasoning is applied to obtain a type-2 fuzzy output, (c) defuzzifier: it maps the output to crisp values, (d) type reducer: it transforms a type-2 fuzzy set into a type-1 fuzzy, and (e) rule base: it contains fuzzy if–then rules and a membership function set known as database (Karnik et al. 1999a, Karnik et al. 1999b). The decision process is conducted by an inference system using the fuzzy if–then rules. These fuzzy rules define the connection between input and fuzzy output variables. The inference system values all the rules dorm the base of rules and combining weights of consequents of all the relevant rules in an only fuzzy set using the aggregation operation (Castillo et al. 2008; Karnik et al. 1999b).

Fig. 6
figure 6

Structure of a type-2 fuzzy inference system

2.3 Firefly algorithm

The firefly algorithm was initially proposed in Yang (2009) and Yang and He (2013), and is based on the firefly’s behavior and flashing. Three basic principles are used in this algorithm: (1) the fireflies are unisex. For this reason, the fireflies can be attracted to other fireflies no matter their sex, and (2) the firefly attractiveness is proportional to its brightness. A couple of fireflies’ behavior consists of the firefly with less brightness moves in the direction to the brighter one. If they both have the same bright, the firefly will move randomly, and (3) the objective function determines the brightness of a firefly. The variation of attractiveness β with the distance r is proposed in Yang and He (2013) and given by the equation:

$$ \beta = \beta_{0} e^{{ - r^{2} }} $$
(8)

where \( \beta_{0} \) is the attractiveness at r = 0. The movement of a firefly i to the brighter one j to the next iteration is defined by the equation:

$$x_{i}^{{t + 1}} = x_{i}^{t} + \beta _{0} e^{{ - r_{{ij}} ^{2} }} \left( {x_{j}^{t} - x_{i}^{t} } \right) + \alpha _{t} \epsilon _{i}^{t} $$
(9)

where xi represents the position of a firefly i in the iteration t, \( \beta_{0} e^{{ - r_{ij}^{2} }} \left( {x_{j}^{t} - x_{i}^{t} } \right) \) represents the attraction between a firefly j and a firefly i, and \( \epsilon_{i}^{t} \) is a vector with random numbers whose randomization parameter is represented by \( \alpha_{t} \); this parameter is the initial randomness scaling factor defined by:

$$ \alpha_{t} = \alpha_{t} \delta^{t} $$
(10)

where δ is a value between 0 and 1. The values for \( \alpha \), β and δ applied in this work are based on the recommendation of other work. To avoid local minimal, this algorithm uses a random array, which allows moving the fireflies and avoids stagnation.

3 Proposed method

The proposed method combines ensemble neural networks, type-2 fuzzy integration, and the firefly algorithm, and its general architecture is described in this section.

3.1 General architecture description

The proposed method consists of ensemble neural networks (ENNs), where the predictions of each artificial neural network (also known as module) are combined using a type-2 fuzzy weighted average, and a firefly algorithm is applied to optimize the ensemble neural networks architecture. In Fig. 7, the general architecture is shown. An ENN can have from 1 to “m” artificial neural networks, where the firefly algorithm establishes the value of “m,” and the prediction of each module (testing set and the next 20 days) is combined using a type-2 fuzzy inference system.

Fig. 7
figure 7

General architecture of proposed method

3.1.1 Description of the ensemble neural network

In this work, three types of neural networks are used to form an ensemble neural network:

  1. 1.

    Feedforward neural network: This kind of neural network has three types of layers: inputs, hidden, and output layer, where neurons of each layer are connected with subsequent layers, except the neurons of the output layer, which produces outputs of the neural network (Che et al. 2011; Gauthier and Micheau 2012).

  2. 2.

    Function fitting neural network: This kind of neural network is very similar to the feedforward neural network. This neural network has a function fitting known as a training process, where inputs are used to produce associated target outputs. This neural network is usually applied to function approximation and time series prediction (Chen et al. 2020; Moradikazerouni et al. 2019).

  3. 3.

    Cascade-forward neural network: This neural network is similar to a feedforward network and has connections directly from the input layer to the subsequent layers (An et al. 2020; Budak et al. 2020).

The prediction error of the neural network k, k = {1, 2, 3,…,m} is given by equation:

$$ MSE_{k} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - \hat{y}_{ki} } \right)^{2} $$
(11)

where yi is the real value in the time i, \( \hat{y}_{ki} \) is the prediction of the neural network k in the time i, and N is the number of data point of the testing set. The m value is defined by the optimization technique (number of neural networks or modules).

3.1.2 Description of the type-2 fuzzy weighted average integration

In this work, type-2 fuzzy logic is applied, where a Mamdani type-2 fuzzy inference system is proposed to combine responses of the ensemble neural network. The number of inputs and outputs is determined by the number of neural networks that form the ensemble neural network. The fuzzy inference system has as inputs the prediction error (MSE) of each module (from module #1 to module #m). The outputs are the weights produced to combine the predictions allowing obtaining a final prediction of the ensemble neural network. In Fig. 8, an example of the type-2 fuzzy inference system for three modules is presented.

Fig. 8
figure 8

Type-2 fuzzy inference system for integration

The fuzzy if rules are automatically generated depending on the number of inputs (modules) of the FIS, each variable (inputs and outputs) has 3 Gaussian membership function, and their linguistic labels are “low,” “medium,” and “high.” The ranges of each fuzzy output variable are 0 to 1. Meanwhile, for the inputs, the range adapts depending on the neural networks errors, i.e., the range is generated based on the prediction error (MSE, normalized values between 0 and 1) of the neural networks, where the errors (MSE) are sorted, and the minimal and maximal values are taken to establish the range of all the fuzzy inputs variables. As the input ranges are adaptable, a new type-2 fuzzy inference system is generated for each evaluation of the ensemble neural network.

In this work, type-2 Gaussian symmetric membership functions with uncertain mean are used and given by Eq. 12. An example of this kind of membership function is shown in Fig. 9.

Fig. 9
figure 9

Type-2 Gaussian membership function

$$ \mu \left( x \right) = igaussmtype2\left( {x, \left[ {\sigma {\text{m}}_{{1,{\text{k}}}} {\text{m}}_{{2,{\text{k}}}} } \right]} \right) $$
(12)

It is important to emphasize that the firefly algorithm does not optimize the fuzzy inference system. Only the prediction error (MSE) of each neural network that forms the ensemble neural network is used to establish the ranges of the fuzzy input variables. The minimal and maximal range of the fuzzy input variables is given by Eqs. 13 and 14. Meanwhile, the fuzzy output variables values are established in Fig. 10. The difference between \( R_{ {\min} } \) and \( R_{ {\max} } \) is defined by Eq. 15.

$$ R_{ {\min} } = {\min} \left( {MSE_{1} ,MSE_{2} ,MSE_{3} , \ldots ,MSE_{m} } \right) $$
(13)
$$ R_{ {\max} } = {\min} \left( {MSE_{1} ,MSE_{2} ,MSE_{3} , \ldots ,MSE_{m} } \right) $$
(14)
$$ R_{\text{dif}} = R_{ {\max} } - R_{ {\min} } $$
(15)

where \( m_{1} < m_{2} \). Sigma is represented with \( \sigma \), the values of \( m_{1,k} \) and \( m_{2,k} \) represent, respectively, mean1 and mean2, where k = 1, 2, and 3 are the number of membership functions in each fuzzy input variable. The \( \sigma \) value for the input variables is established using Eq. 16. The separation between the mean1 and mean2 is defined by Eq. 17.

Fig. 10
figure 10

Example of type-2 fuzzy output variable

$$ \sigma = \left( {R_{{\max} } - R_{{\min} } } \right)*0.2 $$
(16)
$$ R_{\text{s}} = R_{\text{dif}} *0.10 $$
(17)

The mean values for each of the three membership functions used in each fuzzy variable are given by Eqs. 1823.

$$ m_{1,1 } = R_{{\min} } - R_{s} $$
(18)
$$ m_{2,1 } = R_{{\min} } + R_{s} $$
(19)
$$ m_{1,2 } = \left( {\frac{{R_{\text{dif}} }}{2} + R_{ {\min} } } \right) - R_{s} $$
(20)
$$ m_{2,2 } = \left( {\frac{{R_{\text{dif}} }}{2} + R_{ {\min} } } \right) + R_{s} $$
(21)
$$ m_{1,3 } = R_{{\max} } - R_{s} $$
(22)
$$ m_{2,3 } = R_{{\max} } + R_{s} $$
(23)

An example of the fuzzy output variable design is shown in Fig. 11, where \( R_{{\min} } \) is equal to 0, and \( R_{{\max} } \) is equal to 1. Equation 1823 are applied to generate the fuzzy input variable parameters.

Fig. 11
figure 11

Type-2 fuzzy input variable

The total number of possible fuzzy if–then rules is given by the equation:

$$ FR = 3^{m} $$
(24)

where m is the number of inputs (modules) forming the ensemble neural network; the fuzzy if–then rules are formed to combine all neural network predictions based on their prediction error. An example of fuzzy if–then rules when the ENN has two modules (m = 2) is the following:

  1. 1.

    If (e1 is small) and (e2 is small), then (w1 is high) and (w2 is high).

  2. 2.

    If (e1 is small) and (e2 is medium), then (w1 is high) and (w2 is medium).

  3. 3.

    If (e1 is small) and (e2 is high), then (w1 is high) and (w2 is low).

  4. 4.

    If (e1 is medium) and (e2 is small), then (w1 is medium) and (w2 is high).

  5. 5.

    If (e1 is medium) and (e2 is medium), then (w1 is medium) and (w2 is medium).

  6. 6.

    If (e1 is medium) and (e2 is high), then (w1 is medium) and (w2 is low).

  7. 7.

    If (e1 is high) and (e2 is small), then (w1 is low) and (w2 is high).

  8. 8.

    If (e1 is high) and (e2 is medium), then (w1 is low) and (w2 is medium).

  9. 9.

    If (e1 is high) and (e2 is high), then (w1 is low) and (w2 is low).

As was previously mentioned, the type-2 fuzzy inference system has as inputs the MSE values of each neural network. After the defuzzification, the type-2 FIS has as outputs the corresponding weights (as numeric values) for each neural network according to its prediction error (MSE) to obtain a final prediction given by the equation:

$$ P = \frac{{w_{1} \hat{y}_{1} + w_{2} \hat{y}_{2} + \cdots + w_{m} \hat{y}_{m} }}{{w_{1} + w_{2} + \cdots + w_{m} }} $$
(25)

where w1 is the weight of module #1, w2 is the weight of module #2, and so on up to wm, which is the weight of module m, \( \hat{y}_{1} \) is the prediction of module #1, \( \hat{y}_{2} \) is the prediction of module #2 and so on up to \( \hat{y}_{\varvec{m}} \), which is the prediction of module m.

3.1.3 Description of the firefly algorithm for time series prediction

The main contribution of this method is to know which and how many neural networks are needed to perform a good prediction. The firefly algorithm aims at finding optimal ensemble neural network architectures. The architecture consists of:

  1. 1.

    Size of the ensemble neural network (number of neural networks/modules).

  2. 2.

    Selection of neural networks (feedforward, function fitting, or Cascade-forward neural network).

  3. 3.

    Number of hidden layers and their neurons for each neural network.

  4. 4.

    Goal error for each neural network

The backpropagation algorithm used in the training phase to perform the learning process is the Levenberg–Marquardt (LM) algorithm. This algorithm has achieved better results with artificial neural networks applied to time series forecasting (Pulido and Melin 2014; Pulido et al. 2014). In this work, three feedback delays are also applied. The objective function is to minimize the MSE of the ensemble neural network (testing set) and is given by the equation:

$$ f = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {Y_{i} - P_{i} } \right)^{2} $$
(26)

where Yi is the real value in the time i, Pi is the prediction of the ensemble neural network in the time i, and N is the number of data point of the testing set.

In Table 1, the minimum and maximum values for search space to establish the ensemble neural network architecture are shown. These parameters are based on previous works, where pattern recognition was applied (Pulido et al. 2014; Sánchez et al. 2017a, b).

Table 1 Search space to the ENN architectures

In Table 2, the parameters used to perform the evolutions of this algorithm are shown, values of the number of fireflies and the maximum number of iterations are based on (Sánchez and Melin 2014; Sánchez et al. 2017), and for parameters as α, β, and δ, their values are based on the parameters recommended in Yang (2009) and Yang and He (2013). In Fig. 12, the diagram of the proposed method is illustrated.

Table 2 Table of parameters
Fig. 12
figure 12

Diagram of the proposed method

3.2 Dataset description

The dataset is from the Humanitarian Data Exchange (HDX) (The 2020) and contains information about COVID-19 cases of countries of the world. The data period from 01/22/20 to 06/27/20 were selected as a training, validation, and testing set. This period consists of 158 days with information on confirmed and death cases. In this work, 26 countries are analyzed: Austria, Belgium, Bolivia, Brazil, China, Ecuador, Finland, France, Germany, Greece, India, Iran, Italy, Mexico, Morocco, New Zealand, Norway, Poland, Russia, Singapore, Spain, Sweden, Switzerland, Turkey, UK, and the USA. In Figs. 13 and 14, the information of confirmed and death cases by country is, respectively, shown.

Fig. 13
figure 13

Confirmed cases

Fig. 14
figure 14

Death cases

4 Experimental results

The proposed method is applied to the prediction of the COVID-19 time series for confirmed and death cases of 26 countries. The optimized results are obtained using as the testing set 30%, 20%, and 10% (black points in the graphs) of the information because we wanted to know how much information is necessary to achieve a good generalization, leaving the rest (70%, 80%, and 90%), respectively, for the learning phase (blue points in the graphs), divided into the training and validation sets (80/20). The achieved results by the proposed method are compared against the conventional average method, and type-1 fuzzy weighted average integration proposed in Melin et al. (2020), performing 30 runs for a country (in each test). Each neural network (module) of the ensemble neural network performs a prediction of the next 20 days (pink points in the graphs). To integrate their prediction, the weights used in Eq. 25 are used to obtain a final prediction of the next 20 days in type-1 and type-2 fuzzy average integration tests. It is essential to mention that the prediction error presented in the following tables is based on the testing set. We present comparative figures with real next days in this work, predicting confirmed and death cases in the next 20 days. These figures are shown to know whether the techniques with a better prediction (less MSE value) are useful to predict the next days. In this section, only the results for China, USA, and Mexico are shown, and their prediction of the next 20 days. In Sect. 4.1, summaries of the results of the 26 countries are shown. The tables presented in this section show the best architecture obtained by the firefly algorithm in each test, with parameters as size (number of neural networks), type of neural networks, and number of hidden layers for each neural network with their respective neurons, individual MSE, integration method, and ensemble neural network MSE.

In Table 3, the best architectures for confirmed cases for China are presented, where for all the tests, the best architecture uses three modules. The best result is obtained when 30% of the data points are used for the testing phase with three fitting neural networks.

Table 3 Best architecture of ENN for China (confirmed cases)

In Fig. 15, the prediction of each module for the confirmed cases for China is shown, where 30% of data points for the testing phase are used, and as integration, the conventional average method is applied. In Fig. 15a, the prediction of the next 20 days (pink points) tends to decrease, which indicates that it has a bad future prediction, but because the other modules have a good prediction, the final integration improves as Fig. 15d shows.

Fig. 15
figure 15

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, and (d) final prediction for confirmed cases (China)

The average convergence for each test for confirmed cases for China is shown in Fig. 16, where the behavior of the runs with the type-2 fuzzy integration has a better performance than others method. The type-1 FWA integration has a convergence very similar to the average method, except for when 10% is used for the testing phase, where the average method obtains better performance.

Fig. 16
figure 16

Average convergence of confirmed cases for China using (a) 30%, (b) 20%, and (c) 30% for testing phase

The average predictions of the next 20 days of each test for confirmed cases for China are shown in Fig. 17. As these results show, the type-2 fuzzy logic (20% testing set) is the test that achieved predict more close to real data up to the eighth day (Day #166, 07/05/2020). It occurs because the previously confirmed cases were increasing slowly, which caused the neural networks to learn this pattern, and for all the techniques, it was difficult to predict more days. We can notice on the Y-axis that the number of cases increases from 100 to 100. Although type-1 FWA integration at the end of the next 20 days, it was closer to the number of real cases.

Fig. 17
figure 17

Prediction data for confirmed cases (China)

In Table 4, the best architectures for death cases for China are shown. The function fitting neural network prevails as the best neural network. For death cases, the best architecture has four modules using type-2 FWA integration, and 30% of the data points are used for the testing phase.

Table 4 Best architecture of ENN for China (death cases)

In Fig. 18, the prediction of each module for the death cases for China is shown, where 30% of data points for the testing phase are used, using as integration method the type-2 FWA. In Fig. 18b and c, the prediction of the next 20 days tends to decrease, but the other modules allowed with the type-2 FWA integration have a more stable prediction, as Fig. 18e shows. The type-2 fuzzy variables generated for this ensemble neural network are shown in Fig. 19.

Fig. 18
figure 18

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, (d) Module #4, and (e) final prediction for death cases (China)

Fig. 19
figure 19

Type-2 fuzzy variables for confirmed cases (China)

The average convergence for each test for death cases for China is shown in Fig. 20, where the behavior of the runs with the three integration methods seems similar, but the type-2 fuzzy integrator achieved better results than the conventional average method and the type-1 FWA.

Fig. 20
figure 20

Average convergence of death cases for China using (a) 30%, (b) 20%, and (c) 30% for testing phase

The average predictions of the next 20 days of each test for death cases for China are shown in Fig. 21, and as these results show, the type-2 fuzzy logic (10% testing set) is the test that achieved predict more close to real data up to the seventeenth day (Day #175, 07/14/2020).

Fig. 21
figure 21

Prediction data for death cases (China)

In Table 5, the best architectures for confirmed cases for the USA are presented. The best architecture has four modules using as integration the type-1 FWA.

Table 5 Best architecture of ENN for the USA (confirmed cases)

In Fig. 22, the prediction of each module for the confirmed cases for the USA is shown, where 30% of data points for the testing phase using as integration method type-1 FWA. Figure 22a shows how the prediction begins ascending, but it begins to descend after a few days. This situation does not affect the final result shown in Fig. 22d because the other modules had a better prediction, which allowed the final prediction of the next 20 days to rise as expected.

Fig. 22
figure 22

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, and (d) final prediction for confirmed cases (USA)

The average convergence for each test for confirmed cases for the USA shown in Fig. 23, where the behavior of the runs with the three integration methods seems similar when 30% of data points are used as the testing set, but the type-1 FWA achieved a better average than the other integration methods. When 20% and 10% of data points are used for the testing phase, the type-2 FWA had better performance. The type-1 FWA integration and the average method had a convergence very similar.

Fig. 23
figure 23

Average convergence of confirmed cases for the USA

The average predictions of the next 20 days of each test for confirmed cases for the USA are shown in Fig. 24. As these results show, the type-2 fuzzy logic (20% testing set) is the test that achieved predict more close to real data up to the thirteenth day (Day # 171, 07/10/2020).

Fig. 24
figure 24

Prediction data for confirmed cases (USA)

In Table 6, the best architectures for death cases for the USA are shown, where for all the tests, the best architecture uses three modules. The cascade-forward neural network prevails in these results where type-2 FWA integration is applied.

Table 6 Best architecture of ENN for the USA (death cases)

In Fig. 25, the prediction of each module for death cases for the USA is shown, where 30% of data points are used for the testing phase with integration method type-2 FWA. The prediction of the next 20 days for each module is good, although for modules 2 and 3, Fig. 25b and c, respectively, their prediction has a faster ascent. The type-2 FWA integration allowed a good final prediction shown in Fig. 25d, with a more gradual increase.

Fig. 25
figure 25

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, and (d) final prediction for death cases (USA)

The type-2 fuzzy variables generated for this ensemble neural network is shown in Fig. 26.

Fig. 26
figure 26

Type-2 fuzzy variables for confirmed cases (USA)

The average convergence for each test for death cases for the USA is shown in Fig. 27, where the runs with the type-2 FWA integration have a better performance only when 30% of the data points are used for the testing phase. In the other tests, the average method achieved better performance.

Fig. 27
figure 27

Average convergence of death cases for the USA

The average predictions of the next 20 days of the tests for death cases for the USA are shown in Fig. 28. As these results show, the type-2 fuzzy logic (30% testing set) is the test that achieved predict more close to real data up to the ninth day (Day # 167, 07/06/2020).

Fig. 28
figure 28

Prediction data for death cases (USA)

In Table 7, the best architectures for confirmed cases for Mexico are shown. The function fitting neural network prevails as the best neural network, where the best architecture has four modules using as type-2 FWA integration.

Table 7 Best architecture of ENN for Mexico (confirmed cases)

In Fig. 29, the prediction of each module for the confirmed cases for Mexico is shown, where 10% of data points for the testing phase are used, using as integration method a type-2 fuzzy inference system. The prediction of the next 20 days shown in Fig. 29 (b-d) shows a faster increase in confirmed cases. The combination with the prediction of Module #1 shown in Fig. 29a allows to have a better final prediction using the type-2 fuzzy weighted integration.

Fig. 29
figure 29

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, (d) Module #4, and (e) final prediction for confirmed cases (Mexico)

The type-2 fuzzy variables generated for this ensemble neural network are shown in Fig. 30.

Fig. 30
figure 30

Type-2 fuzzy variables for confirmed cases (Mexico)

The average convergence for each test for confirmed cases for Mexico is shown in Fig. 31, where the behavior of the runs with the three integration methods also seems similar when 30% of data points are used for the testing phase, but the type-2 fuzzy integrator achieved a better average than the other integrations in all the tests. In Fig. 31b, the average method and type-1 FWA achieved a behavior very similar. Meanwhile, in Fig. 31c, type-1 FWA had the worst performance.

Fig. 31
figure 31

Average convergence of confirmed cases for Mexico

The average predictions of the next 20 days of each test for confirmed cases for Mexico are shown in Fig. 32. As these results show, the type-2 fuzzy logic (30% testing set) is the test that achieved predict more close to real data up to the tenth day (Day #168, 07/07/2020).

Fig. 32
figure 32

Prediction data for confirmed cases (Mexico)

In Table 8, the best architectures for death cases for Mexico are presented. In this case, the best architecture has four modules using the average method.

Table 8 Best architecture of ENN for Mexico (death cases)

In Fig. 33, a prediction of each module for death cases for Mexico is shown, using as integration method type-2 FWA. We want to show how a type-2 FWA allows us to have a good prediction even when a module (in this case, module #2, shown in Fig. 33a) had a bad performance. The advantage of the proposed integration can be observed in the predictions shown in Fig. 36. The type-2 fuzzy variables generated for this ensemble neural network are shown in Fig. 34.

Fig. 33
figure 33

Individual behavior of (a) Module #1, (b) Module #2, (c) Module #3, (d) Module #4, and (e) final prediction for death cases (Mexico)

Fig. 34
figure 34

Type-2 fuzzy variables for death cases (Mexico)

The average convergence for each test for death cases for Mexico is shown in Fig. 35. The behavior of the runs with the type-2 fuzzy integration has a better performance than the others method. The average method and the type-1 FWA seem to have similar performance, although, in Fig. 35c, the average method had a better result.

Fig. 35
figure 35

Average convergence of death cases for Mexico

The average predictions of the next 20 days of each test for death cases for Mexico are shown in Fig. 36. As these results show, the type-2 fuzzy logic (30% testing set) is the test that achieved predict more close to real data up to the sixth day (Day #164, 07/03/2020).

Fig. 36
figure 36

Prediction data for death cases (Mexico)

4.1 Summary of results

This section presents a summary of results obtained with the conventional average method, type-1, and type-2 fuzzy weighted average. The tests were performed using 30%, 20%, and 10% of the data points for the testing phase for confirmed and death COVID-19 cases of 26 countries. In Table 9, the results achieved (MSE) using 30% for the testing phase for the three integration methods are shown for confirmed cases; as the best averages indicate in bold in the table, most countries obtain a better result with the type-2 FWA integration. Only for two countries: New Zealand and the USA, the type-1 FWA was a better performance. Meanwhile, the conventional average method only had a good performance with France.

Table 9 Confirmed cases (30% for testing phase)

In Fig. 37, the results of confirmed cases using a testing set of 30% are graphically illustrated.

Fig. 37
figure 37

Optimized results of confirmed cases (30% for testing phase)

In Table 10, the results achieved (MSE) using 30% for the testing phase for the integration methods are shown for death cases; as the best averages indicate in bold in the table, all the countries obtain a better result with the type-2 fuzzy weighted average integration.

Table 10 Death cases (30% for testing phase)

In Fig. 38, the death case results using a testing set of 30% are graphically illustrated. In Table 11, the results achieved using 20% for the testing phase for the three integration methods are shown for confirmed cases. As the best averages indicate in bold in the table, most countries obtain a better result with the type-2 FWA. Only for one country, the average method and the type-1 FWA had a better performance, for New Zealand and Switzerland, respectively.

Fig. 38
figure 38

Optimized results for death cases (30% for testing phase)

Table 11 Confirmed cases (20% for testing phase)

In Fig. 39, the results of confirmed cases using a testing set of 20% are graphically shown.

Fig. 39
figure 39

Optimized results of confirmed cases (20% for testing phase)

In Table 12, the results achieved using 20% for the testing phase the three integration methods are shown for death cases; as the best averages indicate in bold in the table, most countries obtain a better result with the Type-2 FWA integration. Only for two countries, New Zealand and the USA, the conventional average method achieved better performance. In Fig. 40, the death case results using a testing set of 20% are graphically shown.

Table 12 Death cases (20% for testing phase)
Fig. 40
figure 40

Optimized results for death cases (20% for testing phase)

In Table 13, the results achieved using 10% for the testing phase for the three integration methods are shown for confirmed cases; as the best averages indicate in bold in the table, most countries obtain a better result with the type-2 FWA integration. The conventional average method only had better performance in Bolivia and the UK. Meanwhile, type-1 FWA integration only works with Finland and Switzerland. In Fig. 41, the results of confirmed cases using a testing set of 10% are graphically shown. In Table 14, the results achieved using 10% for the testing phase for the three integration methods are shown for death cases, as the best averages indicate in bold in the table. Also, most countries obtain a better result with the type-2 FWA integration. The conventional average method only had better performance with Morocco and the USA. Meanwhile, type-1 FWA only works well with New Zealand. In Fig. 42, the death case results using a testing set of 10% are graphically shown.

Table 13 Confirmed cases (10% for testing phase)
Fig. 41
figure 41

Optimized results of confirmed cases (10% for testing phase)

Table 14 Death cases (10% for testing phase)
Fig. 42
figure 42

Optimized results for death cases (10% for testing phase)

The results shown above indicate that a type-2 FWA method allows having, on average, better results in most tests. In the next section, tests are performed to prove their effectiveness statistically.

5 Statistical comparison of results

In this section, Wilcoxon signed-rank tests results are presented. The critical values are shown in Table 15, where the different values of α are shown depending on the statistical significance. For this work, a 0.10 level is used. The averages shown for each country in each test are used to perform these statistical tests.

Table 15 Critical values for Wilcoxon signed-rank test

In Table 16, the results of the Wilcoxon test statistic for confirmed cases are shown comparing the conventional average method and the type-2 FWA integration proposed in this work.

Table 16 Wilcoxon test results (confirmed cases, Part #1)

To compare the results achieved by the proposed method with a 0.10 level of significance, the result in the column named “W” must be equal o smaller than the critical value (column named “W0”) to reject the null hypothesis. As the results have shown, the type-2 FWA integration achieved to improve results over the conventional average method.

In Table 17, the results of the Wilcoxon test statistic for death cases are presented. As the results showed, the type-2 FWA is also achieved to improve results over the conventional average method for death cases.

Table 17 Wilcoxon test results (death cases, Part #1)

In Table 18, the results of the Wilcoxon test statistic for confirmed cases are shown comparing type-1 and the type-2 FWA integration proposed in this work. As the results have shown, the type-2 fuzzy FWA integration achieved to improve results over the type-1 FWA integration. In Table 19, the results of the Wilcoxon test statistic for death cases are presented. As the results showed, the type-2 FWA integration is also achieved to improve results over the type-1 FWA integration for death cases.

Table 18 Wilcoxon test results (confirmed cases, Part #2)
Table 19 Wilcoxon test results (death cases, Part #2)

6 Conclusions

In this paper, a firefly algorithm is proposed to find optimal ensemble neural network architectures using type-2 fuzzy logic for improving weighted average as the integration method to predict confirmed and death COVID-19 cases of 26 countries. The FA finds essential architecture parameters, such as the number of artificial neural networks with their types of artificial neural networks (feedforward, function fitting, or cascade-forward neural network). As an integration method, we proposed a type-2 fuzzy inference system to calculate the weights for an average method. Its input ranges are based on the prediction error (MSE) of the artificial neural networks that form the ensemble neural network, i.e., in each evaluation performed by the firefly algorithm, a type-2 fuzzy system is created, which allows the integration specifically of the ensemble neural network that is being evaluated. The input of the fuzzy inference system is the corresponding MSE error. After the defuzzification, the outputs are the weights (numeric values) for each prediction according to its MSE to obtain a final prediction (testing set and the 20 next days). The results obtained by the proposed integration are compared against a conventional average method and type-1 fuzzy weighted average. The results achieved show how the type-2 fuzzy weighted average obtained better results (MSE) than the other integrations techniques when a final prediction of the testing set is performed, but also this integration showed how its prediction of the next days is the more close to real data. The other methods applied to integrate the responses had better performance in a few countries (1 or 2). This demonstrates the stability of the proposed integration.

In conclusion, the presented results show that the type-2 fuzzy weighted average integration allows us to obtain a good prediction of the next days, even when a module has a bad result, like for the case of Mexico. The results also show that the number of correctly predicted future days may vary by country and the percentage of information used for the ensemble neural network training phase. In some results, it can only predict six days; in other results, it shows that it can predict up to 17 days. The ensemble neural networks are demonstrated to be a useful tool when a good unit integration is applied, as in this work. As future works, the optimization of the fuzzy if–then rules is considered, and for the ensemble neural network, the percentage of data for the training phase are considered. Other optimization techniques will also be used to compare ensemble neural network architectures and reaffirm our proposed integration.