Abstract

This study aims to forecast the COVID-19 spread in Indonesia involving vaccination factors using vector autoregressive with exogenous variables (VARX). The COVID-19 spread represented by active, recovered, and death case rate indicators acts as endogenous variables, while the COVID-19 vaccination represented by second-dose vaccination rates acts as exogenous variables. Because the sum of three COVID-19 spread indicators in one day is one, only two indicators with the highest correlation rates are involved in VARX modelling. The other indicator is practically projected by subtracting one from the sum of two indicator projection results. Based on the analysis results, the active and recovered case rates are two indicators chosen in VARX modelling. Using Akaike information criterion, the most suitable VARX model to project the case and recovered case rates are VARX (7, 1). This model is expected to help the Indonesian government project the COVID-19 spread in Indonesia.

1. Introduction

COVID-19 has spread in Indonesia for over a year and a half since 11 March 2020. The spread of COVID-19 has continued to record terrible records. Based on data from the Ministry of Health of the Republic of Indonesia, on 23 June 2021, the number of daily confirmed positive cases of COVID-19 in Indonesia recorded a record high, namely 15,308 new cases. This record continued to be broken in the days that followed. The highest record recorded until 7 November 2021 was 56,757 cases, set on 15 July 2021. Then, the highest number of daily deaths due to COVID-19 recorded until 7 November 2021 was 2,069 cases, where the incident occurred on 21 July 2021. The Indonesian government has made various policies to reduce COVID-19 spread rates. One of the main policies is the National Vaccination Program [13]. In the first two months after the start of the policy on 13 January 2021, this policy showed positive developments toward the COVID-19 spread [4]. Active and death case rates of COVID-19 tend to decrease, while COVID-19 recovered case rates increase. As of 7 November 2021, active, recovered, and death case rates of COVID-19 are 0.2548%, 96.3662%, and 3.3790%, respectively. It further gives the Indonesian government a sense of optimism about reducing the COVID-19 spread through vaccination [5].

Knowledge of models that can assist this projection process is needed to project how many vaccination rates must be achieved every day to realize the desired COVID-19 spread rates. Therefore, this study aims to design a model for projecting the COVID-19 spread rates by involving the COVID-19 vaccination in Indonesia. The model used is vector autoregressive with exogenous variables (VARX). This model is chosen because it can explain the causal relationship between endogenous variables, where exogenous variables influence endogenous variables. The COVID-19 spread rates can be measured using various indicators. This study uses three indicators: active, recovered, and death case rates. The sum of three indicators in one day is one. So, to simplify the projection, only two indicators are involved in VARX modelling. The projection of the other indicator is made by subtracting one from the projection results of the two indicators modelled by VARX. Then, the second-dose COVID-19 vaccination rates represent the COVID-19 vaccination factor against the Indonesian population. It is done because people in Indonesia have been wholly vaccinated if they have been vaccinated twice [6]. This research is expected to help the Indonesian government project vaccination rates the next day to control the COVID-19 spread in Indonesia.

2. Literature Review

Forecasting is closely related to time series analysis [7, 8]. Therefore, articles on modelling and forecasting the COVID-19 spread in Indonesia using the concept of time series analysis are discussed briefly in this section. The articles discussed are English articles indexed in the Scopus database. Kirana and Bhawiyuga [9] modelled and forecasted the COVID-19 spread using naive methods. The COVID-19 spread used is the number of monthly COVID-19 cases. Their model’s mean absolute percentage error (MAPE) is 15.85%. On the basis of the accurate classification of the model, Wei [10] stated that this model is suitable for forecasting the number of monthly COVID-19 cases in the next month. Then, Djakaria and Saleh [11] modelled and forecasted the COVID-19 spread in Gorontalo Province, Indonesia, using the Holt-Winters smoothing method. The cumulative number of COVID-19 cases is used as the COVID-19 spread indicator. The MAPE of the model obtained is 6.14%, so the model is perfect for forecasting the cumulative number of COVID-19 cases on the next day based on the model’s accurate classification by Wei [10].

Then, Aditya Satrio et al. [12] analyzed and forecasted the COVID-19 spread in Indonesia, where the COVID-19 spread indicators used are the cumulative number of cases, recovered, and death of COVID-19. Each indicator is modelled separately using the autoregressive integrated moving average (ARIMA) model. Using the same method, Fadly and Sari [13] modelled and forecasted the number of funerals carried out in DKI Jakarta Province, Indonesia, since COVID-19 hit. The articles described in the previous paragraph use univariate time series analysis, whereas multivariate time series analysis can also be used. Fitriani et al. [14] simultaneously model and forecast the number of daily COVID-19-positive cases in Indonesia and Singapore. The multivariate time series model used is vector autoregressive integrated (VARI). The MAPE obtained in each place was 32.74% and 37%, respectively. The MAPE values are enormous, so based on the model’s accurate classification by Wei [10], the model is unsuitable for forecasting. Then, Meimela et al. [15] modelled and forecasted the COVID-19 spread in Indonesia using the vector autoregressive integrated moving average (VARIMA). The COVID-19 spread indicators used are the daily positive and death cases. VARIMA (1, 1, 1) is the best model got.

Also, Akbar et al. [16] analyzed the relationship between COVID-19 active case rates assumed to be influenced by population growth rates. Bandung City and Purwakarta Regency are the two areas considered, whereas the population growth rate in West Java Province represents the population growth rate. The vector autoregressive integrated with exogenous variables (VARX) describes the relationship. The MAPE obtained in each place is 1% and 11.8%, respectively. It aligns with Wei’s accurate classification model (2006); the model is perfect for forecasting the COVID-19 spread in Bandung City, while the other is only good. As explained in the previous paragraphs, no articles in the Scopus database have explored modelling and forecasting active, recovered, and death case rates of COVID-19 in Indonesia, considering vaccination factors using the VARX model. Therefore, this is a good opportunity to perform such modelling and forecasting.

3. Materials and Methods

3.1. Materials

The data used in this study are (1) active, recovered, and death case rates of COVID-19 data and (2) second-dose COVID-19 vaccination rates data. The period considered is from 27 January 2021, when the second-dose vaccination program starts, until 7 November 2021. All data are obtained from the official website of the Science and Technology Index, National Research and Innovation Agency, Ministry of Research and Technology of the Republic of Indonesia.

3.2. Methods
3.2.1. Vector Autoregressive with Exogenous Variables

The general form of the vector autoregressive with exogenous variables (VARX) model with the order of the endogenous variables and the order of the endogenous variables , VARX (), is expressed as follows [1721]:where represents the -dimensional vector containing the endogenous variables at time , express the -order coefficient matrix of , represents the -dimensional vector containing the exogenous variables at time , express the -order coefficient matrix of , and represents the -dimensional random error vector. Equation (1) can also be reformulated as follows [22]:where , , , and .

There are three assumptions of the VARX model presented as follows [23]:(1)The VARX (p, q) model is stationary if the zero generators of the and determinants are outside the unit circle(2) is independent and identically multivariate-normally distributed with zero-vector mean and the constant covariance-matrix variance, and . is a positive semidefinite matrix expressed as follows: , where (3) and are independent

Many criteria can be used to determine the order of the VARX model. Lui et al. [22] stated that the order of the VARX model is determined through the Akaike information criterion (AIC). The model with the smallest AIC is the best. The AIC value of a model can be determined by the following equation [24]:where is the size of observation data, and is many model parameters.

3.2.2. Augmented Dicky-Fuller Test

The unit root test equation for Augmented Dicky-Fuller (ADF) from data that have constant and trend elements is as follows [2527]:where represents observation data at the time , represents the difference between and , is a constant, represents the trend parameter, is the parameter of , represents the optimum lag, and is random error that is independent and identically normally distributed with zero mean and constant variance. The followings are the test hypotheses used: : , and : . The test statistic used is the t-ratio of the parameter estimator estimated using the least square method stated as follows:where is an estimator , and is the standard deviation of . If less than the critical value, then rejected which means the data are stationary and vice versa. If is not rejected, then data must be differentiated until it is stationary [28, 29].

3.2.3. Granger Causality Test

Suppose that there are two stationary time series and . In certain cases, can be affected by and , and can also be affected by and . This relationship is called a bidirectional causality relationship [30, 31]. The Granger causality test is a popular statistical test used to check the relationship.

The relationship between with , which is also called a restricted relationship, is stated as follows [32]:

Then, the relationship between with and , which is also called the unrestricted relationship, is stated as follows [3335]:

The test hypotheses used are as follows: , and . Test statistics determined by the following equation [36]:where and represent the coefficient of determination in equations (6) and (7), respectively. If the test statistic is greater than , then is rejected, which means influenced by and [37]. and have a bidirectional causality if influenced by and , and influenced by and [38, 39].

3.2.4. VARX Parameter Estimation Using Maximum Likelihood (ML) Method

The VARX models with and can be stated as follows:where

Based on equation (9), can be stated as follows:

Because , then . Therefore, the distribution function of can be expressed as follows [40, 41]:

Vector estimator of obtained by maximizing the likelihood function of which is stated as follows:

Generally, equation (14) is transformed into a natural logarithm to facilitate maximization. This function is then referred to as the log-likelihood function, which is expressed as follows:

Vector estimator of is the zero-generator vector of the derivative of equation (15) on . The following is the equation used to estimate the vector :

3.2.5. Diagnostic Test

The diagnostic test is a feasibility test of the model for forecasting. This feasibility test includes checking the assumption of error and the model’s mean absolute percentage error (MAPE). The model errors are tested to determine whether they are independent and identically normally distributed with zero mean and constant variance. This assumption is also known as the white noise assumption. One of the statistical tests commonly used to test whether the errors are independent or not is the Ljung-Box test. The null hypothesis in this test is that the errors are independent, while this test’s alternative hypothesis is the opposite. The test statistic with the lag length denoted by is determined using the following equation [42, 43]:where

Reject if [44, 45].

Meanwhile, the naked eye can check the assumption that the error is normally distributed with zero mean and constant variance using the quantile-quantile plot (Q-Q Plot) [46]. Suppose is the value of the normal cumulative distribution function with zero mean and constant variance of . If the scatter of point pairs is around a line with a gradient one, then is visually normally distributed with zero mean and constant variance. If the assumption of white noise in the model error is met and the MAPE of the model is less than 20%, then the model obtained is suitable for use in forecasting.

4. Results

The symbols used for each data are stated as follows: states daily COVID-19 active case rate data, states daily COVID-19 recovered case rate data, states daily COVID-19 death case rate data, and states second-dose COVID-19 vaccination rate data. The sum of the three indicators in one day is one, so to simplify the modelling and forecasting process, and only two indicators are used in VARX modelling. The remaining indicator is practically projected by subtracting one from the two indicators’ projected results. The two indicators selected are the two indicators that have the highest correlation rates.

4.1. Data Descriptive

Descriptive statistics for each data obtained using Microsoft Excel software are presented in Table 1.

Table 1 shows the intensity of active, recovery, death, and vaccination rates from COVID-19 in Indonesia per day, each of which is different. The intensity values are 7.8119%, 89.2812%, 2.9069%, and 7.8703% per day, respectively. The intensity values of the active and death cases appear to be lower than that of vaccination. It indicates that the number of Indonesian citizens exposed and died from COVID-19 is lower than the number of people vaccinated daily. It shows that the Indonesian government is serious about handling COVID-19 via vaccination. Then, there are deviations from the active, recovery, death, and vaccination rates from COVID-19 against each average daily. On average, the deviation values from the active, recovery, death, and vaccination rates from COVID-19 were 5.1750%, 4.9703%, 0.2807%, and 8.0520% per day, respectively. The daily deviations of the active and death rates are lower than the deviation of the vaccination rate. It indicates that Indonesian citizens are increasingly varied of COVID-19 with a high willingness to vaccinate. It prevents extreme value deviations from occurring. Finally, the variance of , , , and is 0.2678%, 0.2470%, 0.0008%, and 0.6484%, respectively. The interpretation of the variance values is similar to the interpretation of the deviation value. Then, the correlation rates between , , and obtained using Microsoft Excel software are presented in Table 2.

Table 2 displays the correlation rate between and , −0.9993, which is the highest correlation rate among the correlation rates of the other indicator data pairs. It is not surprising because the two are rationally the opposite of each other. If the active case rate increases, the recovered case rate decreases, and vice versa. This fact can be seen visually in Figure 1. Therefore, and are selected as endogenous variables. The high negative correlation between and is not a problem in VARX modelling. It is the same as regression modelling, where if the correlation rate between variables is high, the model formed can provide an excellent description of the data.

Table 3 captures the correlation rates between endogenous and exogenous variables above 0.6. It indicates that endogenous and exogenous variables are strong enough.

4.2. Data Stationary Test

In this section, , , and are checked for stationary visually and formally. Visually, the stationarity of the three data is seen from each graph, while formally, the stationary of the data is tested using the ADF test. Each data graph is observed whether it is around the average line or not. If the data graph is around the average line, then the data are stationary, and vice versa. The graphs of , , and with their respective average lines are presented in Figure 1.

Figure 1 displays that the graphs of the three data do not appear to be around the average line. Therefore, visually, the three data are not stationary. Then, the formal stationarity test of the three data is carried out using the ADF test. The significance level and optimal lag used are 0.05 and 6, respectively. The test statistics of each data are determined using equation (5). The formal stationary test result of the three data using the ADF test is presented in Table 4.

Based on Table 4, all data have a value of not less than the critical value. Therefore, on each data test is not rejected, which means that all the data are not stationary, so the differencing process using equation (3) is carried out. After the data are differentiated, the next step is the three data’s visual and formal stationarity tests. After the differencing process is carried out twice, all data are stationary. The graphs of , , and with their respective average lines are presented in Figure 2.

Figure 2 shows the graphs of three data differentiated twice and appear to be around the average line. Therefore, visually, they are stationary. Next, the formal stationarity test is carried out using the ADF test. The significance level and optimal lag used are 0.05 and 6, respectively. The test statistics of each data are determined using equation (5). Furthermore, a formal summary of the three data stationary tests using the ADF test is presented in Table 5.

Table 5 captures the three data differentiated twice that have a value less than the critical value. Therefore, of each test is rejected, which means that all data are stationary.

4.3. Causality Relationship Test

A check of the existence of a bidirectional causal relationship between and in this study is carried out using the Granger causality test. The significance levels, optimal lag, and residual degrees of freedom used are 0.05, 7, and 256, respectively. The effect of and on is tested first. The value obtained is 2.6931. This value is greater than the critical value, 2.0442. Therefore, is rejected, which means that is affected by and . Next, the effect of and on is tested. The value obtained is 2.7514. This value is also greater than the critical value, 2.0442. Therefore, is rejected which means that is affected by and . Because is affected by and , and is affected by and , then and have a bidirectional causality relationship.

4.4. VARX Order Identification

The AIC value, as described in Section 2, is used as a criterion to determine the best order of endogenous and exogenous variables of the VARX model. The model with the smallest AIC value is the most suitable. The AIC value for each model is determined by equation (3). The maximum order of the endogenous and exogenous variables considered is 10. The AIC value for each model is determined using the R software. Based on the results obtained from the software, the AIC value of the VARX (7, 1), −34.2194, is the smallest AIC value among other models. Therefore, the most suitable model chosen is the VARX (7, 1).

4.5. VARX (7, 1) Parameter Estimation Using ML Method

The VARX (7, 1) parameters are estimated using the ML method, equation (14), as described in Section 2. The estimation process is carried out using the help of Microsoft Excel software. The parameter estimation results are presented in Table 6.

On the basis of the parameters obtained in Table 6, the VARX (7, 1) model is stated in equation (16).

In order to be used to forecast the actual data, equation (18) is transformed to the actual data as follows:

Meanwhile, can be modelled as follows:

4.6. Diagnostic Test of VARX (7, 1)

Independence of both and to time is carried out using the Ljung-Box test described in section 2. The significance levels and the degrees of freedom used are 0.05 and 7, respectively. The test statistic is determined using equation (18). The test statistics of and obtained are 3.3610 and 3.4270, respectively. The test statistic values are smaller than the critical value, 14.0671. Therefore, of each independence test of and to times is not rejected, which means that and are independent of time. Next, the normality test of the VARX (7, 1) model error is visually carried out first through the Q-Q plot. Q-Q plots for and are presented in Figure 3.

Figure 3 indicates the scattering of points in each plot appears to be around a line with one gradient. Therefore, and visually follow a normal distribution with zero mean and constant variances, namely 0.00108 and 0.00107. Based on this, the model equation (20) errors fulfill the assumption that they are independent and identically normally distributed with zero mean and constant variance. The next step is checking the error size of the model. This study measures the error size using the mean absolute percentage error (MAPE). The MAPE value of and is 1.5751% and 0.0833%, respectively. The MAPE value is less than 10%, even close to zero. Therefore, according to Wei [10], the VARX (7, 1) model is very feasible to forecast and in the next day based on the feasibility criteria of the MAPE-based model.

The feasibility of the VARX (7, 1) model is also checked using the visual comparison between the forecast results and the actual data. The visual comparison between the forecast results and the actual data is presented in Figure 4.

Figure 4 shows that the forecast line of and appears very close to the actual data of and . It means that the VARX (7, 1) model is visually feasible to forecast and in the next day. The VARX (7, 1) model is also practically forecast and in the next few days.

4.7. Forecasting

Forecasting , , and are carried out on six days, from 8 November 2021 until 13 November 2021. These forecasts are practically carried out using equation (19) for and and equation (20) for . The second-dose COVID-19 vaccination rates used for this forecast are the same as the actual on that date, 29.5200%, 29.7817%, 30.1154%, 30.4491%, 30.7828%, and 31.0465%. The , , and forecasting results obtained are presented in Table 7.

Table 7 shows that the forecasting results of , , and from 8 November 2021 until 13 November 2021 tend to decrease. To determine the accuracy of the forecast results presented in Table 7, we will look at the mean absolute percentage error (MAPE) size. The MAPE of , , and forecasting results on 8 November 2021 until 13 November 2021 is 2.2853%, 0.0051%, and 0.0082%, respectively. This MAPE of , , and forecasting results is tiny. Therefore, equations (19) and (20) are suitable for forecasting , , and for the next six days.

5. Discussions

First, an analysis of the influence of the second-dose COVID-19 vaccination rates on active and recovered case rates of COVID-19 is discussed. Generally, equation (19) shows that the second-dose COVID-19 vaccination rate in the previous three days influences active and recovered case rates of COVID-19 today. It does not always decrease active COVID-19 case rates but increases active COVID-19 case rates. It is also an increase and decrease in the COVID-19 recovered case rate. The active case rates of COVID-19 today are −1.17% of second-dose COVID-19 vaccination rate today, 24.37% of second-dose COVID-19 vaccination rate yesterday, −45.23% of second-dose COVID-19 vaccination rates the previous two days, and 22.03% of second-dose COVID-19 vaccination rate previous three days. A negative percentage means that the second-dose COVID-19 vaccination rate will decrease today’s COVID-19 active case rate. In contrast, a positive percentage means that the second-dose COVID-19 vaccination rate will increase today’s COVID-19 active case rate.

Meanwhile, the recovered case rates of COVID-19 today are 1.04% of the second-dose COVID-19 vaccination rate today, −23.70% of the second-dose COVID-19 vaccination rate yesterday, 44.28% of the second-dose COVID-19 vaccination rate previous two days, and −21.62% of second-dose COVID-19 vaccination rate previous three days. A positive percentage means that the second-dose COVID-19 vaccination rate will increase today’s COVID-19 recovered case rate. In contrast, a negative percentage means that the second-dose COVID-19 vaccination rate will reduce COVID-19 recovered case rate today. These facts do not mean that vaccination in Indonesia is not adequate. Suppose the percentage of the effect of today’s vaccination up to the previous three days on the active and recovered case rates is added. In that case, the results have negative and positive values, respectively. Negative values indicate that vaccination reduces the daily active case rate, and positive values indicate that vaccination generally increases the daily recovery case rate. Next is an analysis of the influence of active and recovered case rates of COVID-19 on itself. In general, equation (19) shows that the active and recovered rates of COVID-19 today are influenced by themselves on the previous three days. The percentage of each day can be seen in equation (19).

The last is an analysis of the effect of second-dose COVID-19 vaccination rates on 8 November 2021 on active and recovered case rates of COVID-19. Suppose the second-dose COVID-19 vaccination rates on 8 November 2021 are assumed not to increase from the previous day. Thus, the forecasted active and recovered case rates of COVID-19 on 8 November 2021 are 0.2321% and 96.3886%, respectively. The forecast value of the COVID-19 active case rates is greater than the forecast value obtained in the forecasting section. The forecast value of the COVID-19 recovered case rates is also smaller than that obtained in the forecasting section. It shows that if the second-dose COVID-19 vaccination rates on 8 November 2021 are not increased, the decline in the COVID-19 spread rate in Indonesia will be slower. Therefore, although the second-dose COVID-19 vaccination rates do not always decrease the COVID-19 active case rates and increase the COVID-19 recovered case rates every day, the second-dose COVID-19 vaccination rates on 8 November 2021 must still be increased. It also applies from 9 November 2021 until 13 November 2021.

6. Conclusion

The VARX model (7, 1) is the best VARX model to describe a causal relationship between active and recovered case rates of COVID-19 in Indonesia influenced by the exogenous variable, second-dose COVID-19 vaccination rates. The assumption of white noise is fulfilled, and the MAPE of this model is tiny, so this model is very suitable to be used in the forecasting process the next day. Forecasting results in the six days also give very close results to the actual data (even if the absolute percentage error is almost zero). There is an oddity in this VARX (7.1) model. The second-dose COVID-19 vaccination rates carried out daily should always reduce COVID-19 active case rates and increase COVID-19 recovered case rates.

However, in this model, the impact of the second-dose COVID-19 vaccination rates carried out in the previous three days on COVID-19 active case rates can decrease and increase simultaneously. It also happened to the impact of the second-dose COVID-19 vaccination rates on the COVID-19 recovered case rates. The second-dose COVID-19 vaccination rates carried out in the previous three days on COVID-19 recovered case rates can increase and decrease simultaneously. However, it does not indicate that vaccination does not affect daily activities and recovery case rates. The sum of the percentage effect of today’s vaccination to the previous three days on the active and recovered case rates has negative and positive values, respectively. These negative and positive values indicate that vaccination generally decreases the daily active case rate and increases the daily recovery case rate.

Another analysis result obtained is that if the second-dose COVID-19 vaccination rates do not increase from the previous day, COVID-19 active case rates will only decrease slightly, and COVID-19 recovered case rates will only increase slightly. It has caused the COVID-19 spread in Indonesia to decline more slowly. Based on this, the vaccination rate must still be increased. With this model, the Indonesian government can project the second-dose COVID-19 vaccination rates that must be carried out today so that active and recovered case rates of COVID-19 can be achieved by that date.

Data Availability

All data are obtained from the official website of the Science and Technology Index, National Research and Innovation Agency, Ministry of Research and Technology of the Republic of Indonesia.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This project was funded by the Padjadjaran University Internal Research Grant, the “RISET DATA PUSTAKA DAN DARING (RDPD)” program under Prof. Dr. Sukono, MM., M.Sc., with Number: 2203/UN6.3.1/PT.00/2022.