Next Article in Journal
Studying Interfacial Dark Reactions of Glyoxal and Hydrogen Peroxide Using Vacuum Ultraviolet Single Photon Ionization Mass Spectrometry
Next Article in Special Issue
Development of the Global to Mesoscale Air Quality Forecast and Analysis System (GMAF) and Its Application to PM2.5 Forecast in Korea
Previous Article in Journal
Morphology, Mineralogy, and Chemistry of Atmospheric Aerosols Nearby an Active Mining Area: Aljustrel Mine (SW Portugal)
Previous Article in Special Issue
The On-Line Integrated Mesoscale Chemistry Model BOLCHEM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Air Pollutant Analysis and AQI Prediction Based on GRA and Improved SOA-SVR by Considering COVID-19

1
School of Science, North University of China, Taiyuan 030051, China
2
School of Information and Communication Engineering, North University of China, Taiyuan 030051, China
*
Author to whom correspondence should be addressed.
Atmosphere 2021, 12(3), 336; https://doi.org/10.3390/atmos12030336
Submission received: 27 January 2021 / Revised: 22 February 2021 / Accepted: 28 February 2021 / Published: 5 March 2021
(This article belongs to the Special Issue Regional Air Quality Modeling)

Abstract

:
Since COVID-19 pneumonia broke out, the Chinese government has taken a series of measures to control the spread of the epidemic, which has made the air quality of Taiyuan in February 2020 significantly better than during the same period in previous years. In this paper, the Gray Relational Analysis (GRA) method was first applied to evaluate and analyze the influence of six major pollutants on air quality. Then, the improved seagull optimization algorithm (ISOA) was proposed and combined with Support Vector Regression (SVR) to establish a hybrid predicted model ISOA-SVR. Finally, the proposed ISOA-SVR was utilized to predict air quality index (AQI). The experimental results on two kinds of different data showed that the proposed ISOA-SVR had the better generalization ability and robustness compared with other predicted models. Further, the proposed ISOA-SVR is suitable for the prediction of AQI.

1. Introduction

With the concentration of urban population and industrial growth, environmental pollution problems are arising. Previous research showed that air pollution has a direct impact on human health. It is estimated that 1.22 million people are killed every year by haze in China, accounting for about 15% of annual deaths [1]. According to the data of the World Health Organization in 2018, air pollution causes seven million people to die of cancer, stroke, heart disease and respiratory diseases every year. Air pollution has become the number one killer of human beings. In 2019, the World Health Organization lists air pollution as one of the top ten threats to global health [2].
Air quality prediction is helpful to prevent air pollution from causing damage, and it can guide government departments to take corresponding preventive measures to prevent serious air pollution, such as heavy polluting enterprises stopping production and traffic restriction. At the same time, it can provide travel suggestions for people’s outdoor activities. Therefore, timely and reliable air quality forecasting is very necessary for the prevention and control of air pollution. Air quality index (AQI) is an important index to reflect and evaluate air quality [3]. According to China’s air quality standard, AQI is obtained from the concentrations of six main pollutants including fine particulate matter (PM2.5), inhalable particles (PM10), sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2) and ozone(O3) [4]. The value of the AQI index reflects the level of air quality; a low AQI value means that the air quality is good, and it is conducive to travel, whereas a higher AQI value indicates that the air pollution is serious.
In the face of increasingly serious air pollution problems, people have more and more attention on air quality. An autoregressive moving average (ARIMA) model was used to predict the monthly value of air pollution index and the daily value of AQI in [5,6]. However, if the sequence is nonlinear or irregular, ARIMA may not provide reliable prediction results. Compared with the ARIMA model, Support Vector Regression (SVR) [7,8] has been widely used in nonlinear systems with good generalization ability. Chen et al. used SVR and a multivariate analysis model to analyze daily precipitation [9]. Withing the literature [10,11] SVR has been applied to predict short-term electrical load. Mauro Castelli et al. used an SVR model to predict AQI in California [12]. In the literature [13], an SVR model was used to predict the AQI value of Beijing and NOx values in India. A mixed prediction model based on the combination of fractional grey theory and SVM was established by Meng Dun et al. to predict the concentrations of PM10, PM2.5 and NO2 in Shijiazhuang and Chongqing [14]. However, the performance of SVR depends on the choice of SVR model parameters. In other words, the selection of parameters has a great influence on the learning and generalization ability of SVR. Therefore, it is very important to select optimal parameters to obtain good SVR performance. In fact, since the SVR model was proposed, many researchers have given attention to the problem of SVR parameter optimization. In [15,16], the parameters of SVR were optimized by a genetic algorithm (GA). Wang and Zhang used a differential evolution algorithm (DE) to find optimal SVR parameters [17,18]. The optimization of SVR parameters by particle swarm optimization (PSO) have been studied in literature [19,20,21]. In recent years, with the development of various new intelligent algorithms, there are more choices for the optimization algorithm of SVR model parameters. Cao optimized SVR parameters with fruit fly optimization (FFO) [22], and optimization of SVR parameters with gray wolf optimization (GWO) appeared in [23].
In this paper, we first evaluate and analyze the air quality of Taiyuan city, which since the outbreak of COVID-19 has been done using the grey relational analysis (GRA) method [24]. Then, three improvements to the seagull optimization algorithm (SOA) [25] are proposed. First, according to the flight characteristics of seagulls, the control parameters that adjust the seagulls to move to the best position direction are changed from a linear change to an exponential change. Second, considering the influence of individual seagulls and groups, an adaptive nonlinear weight is introduced. Third, seagulls with poor fitness are selected for position random selection update to increase the diversity of samples to avoid local minimum errors in the process of optimization. On this basis, we propose an improved seagull optimization algorithm (ISOA) combined with support vector regression (SVR) to establish a hybrid prediction model ISOA-SVR. Finally, the proposed ISOA-SVR method is used to predict air quality index (AQI). In addition, we combine particle swarm optimization (PSO) [26], sine cosine optimization (SCA) [27], whale optimization (WOA) [28] and an SOA algorithm with SVR, respectively, to establish PSO-SVR, SCA-SVR, WOA-SVR and SOA-SVR, and compare them with the experimental results of ISOA-SVR in turn. The experimental results on two different data sets showed that the proposed ISOA-SVR method had the better performance, and the generalization ability and robustness of the model were proved.
The main structure of this paper is as follows. In Section 2, the grey relational analysis method and SVR model are briefly reviewed. The improved SOA algorithm (ISOA) and proposed ISOA-SVR integration model are introduced in Section 3. Section 4 introduces the experimental materials, including data, experimental environment and evaluation index. Section 5.1 includes the evaluation and analysis of the air quality of Taiyuan city before and after the outbreak of COVID-19 by using the grey relational analysis method. The comparative experiments on the prediction of AQI values using SVR prediction models optimized by different optimization algorithms are arranged in Section 5.2. Finally, Section 6 concludes and discusses the paper.

2. Materials and Methods

2.1. Grey Relational Analysis (GRA)

Grey relational analysis (GRA) is a method to measure the degree of relational between factors according to the degree of similarity or difference in the development trend between them [24]. If the change trend of two factors is consistent, that is, the degree of synchronous change is high, this indicates that the relational degree between the two factors is high; otherwise it is low. The specific steps of the grey relational method are as follows:
Step 1: Determine the analysis sequence.
The reference sequence reflecting the characteristics of system behavior and the data sequence of factors influencing system behavior are determined.
Step 2: Dimensionless variables.
Because the data of each factor column in the system may be different due to different dimensions, it is not convenient to compare, or it is difficult to get the correct results when comparing. Therefore, in order to ensure the reliability of the results, dimensionless processing should be carried out first in the grey relational analysis.
Step 3: Calculate the relational coefficient.
The relational coefficient of comparison sequence x i ( i = 1 , 2 , n ) to reference sequence x 0 at time t is as follows:
ξ i ( t ) = min i ( Δ i ( min ) ) + ρ max i ( Δ i ( max ) ) x 0 ( t ) x i ( t ) + ρ max i ( Δ i ( max ) ) ,
where ρ is the resolution coefficient, usually the number in (0, 1).
min i ( Δ i ( min ) ) = min i ( min t x 0 ( t ) x i ( t ) ) ,
max i ( Δ i ( max ) ) = max i ( max t x 0 ( t ) x i ( t ) ) ,
Step 4: Calculate the relational degree.
Because the relational coefficient is the value of the relational degree between the comparison sequence and the reference sequence at each time (i.e., each point in the curve), there is more than one of them and the information is too scattered to facilitate the overall comparison. Therefore, it is necessary to concentrate the relational coefficient of each time into one value; that is, to calculate the average value as the quantitative expression of the relational degree between the comparison sequence and the reference sequence. The relational degree is calculated as follows:
r i = 1 N t = 1 N ξ i ( t )
Step 5: Rank the relational degree. The larger the value, the stronger the relational.

2.2. Support Vector Regression (SVR)

Support Vector Regression (SVR) is a support vector algorithm for regression problems [7]. The goal of SVR is to find an optimal hyperplane, so that the deviations between training data and the hyperplane are not greater than ε . In other words, it minimizes the distance between the support vectors on both sides of the hyperplane. A schematic diagram of SVR is shown in Figure 1.
Therefore, a regression function f ( x ) can be obtained by training SVR so that its deviation from the actual value y is not more than ε for each training vector x . The details of the SVR are presented in reference [8]. The regression function f ( x ) can be obtained as:
f ( x ) = i = 1 m ( a i α i ) K ( x , x i ) + b
where x is the input feature vector, b is the bias term, α i and α i are Lagrange multipliers and K ( x , x i ) is a kernel function with functional forms that can be obtained from the following equations:
linear   kernel   function :   K ( x , x i ) = x T x i . polynomial   kernel   function :   K ( x , x i ) = ( γ x T x i + 1 ) d ,   d ( 1 , n ) ,   γ > 0 . radial   basis   kernel   function :   K ( x , x i ) = exp ( γ x x i 2 ) ,   γ > 0 .

3. Proposed Method

3.1. Seagull Optimization Algorithm (SOA)

The seagull optimization algorithm (SOA) is a swarm intelligence optimization algorithm proposed by Dhiman and Kumar in 2019 [25]. In this algorithm, the migration of seagulls and the attack behavior in the migration process are regarded as the optimization process of the optimization problem, and the location of seagulls in the algorithm is the potential solution of the optimization problem. The migration and attack patterns of seagulls are shown in Figure 2.
1. Migration behavior of seagulls (global search)
In this part, the algorithm simulates how the group of seagulls move from one position to another. In the process of migration, the movement behavior of seagulls satisfies the following three conditions:
(1).
Collision avoidance. An additional variable A is employed for the calculation of new seagull position to avoid collisions between seagull neighbors.
A = a a ( t / T )
C = A P
where C is defined as a new position that does not allow collision with other seagulls. The current position of seagulls is represented by P . t and T are the current iteration and the maximum number of iterations, respectively. A represents the movement behavior of seagulls in a given search space. The constant a is used to control the range of A , when a = 2 . A decreases linearly from 2 to 0.
(2).
Movement towards global best position direction: After avoiding the overlap with other seagulls, the seagulls move toward the direction of global best position.
B = 2 A 2 b r a n d ,
M = B ( P b e s t P )
where M is the step size to move from the current position to the global optimal position and P b e s t is the global best position. The control parameter used to balance global search and local search is defined as B , where b r a n d is a random number that lies in the range of (0, 1).
(3).
Moving distance. When the seagull moves to a position where it does not collide with other seagulls, it moves towards the direction of global best position. D is the distance that the seagull moves from the current position towards the global best position.
D = C + M .
2. Attacking behavior of seagulls (local search)
Seagulls use their wings and weight to maintain their flight height, and they can constantly change their attack angle and speed during migration. Seagulls move in a spiral pattern when attacking their prey. The spiral movement behavior of seagulls in x , y and z planes is described by Equations (11)–(14).
x = r cos ( θ ) ,
y = r sin ( θ ) ,
z = r θ ,
r = u e θ v ,
where r is the radius of the spiral, θ is a random angle value in the range of [ 0 , 2 π ] , u and v are constants that define the spiral shape and e is the base of natural logarithm. The updated position P ( t ) of seagulls is obtained using Equation (15).
P = x y z D + P b e s t

3.2. Improved Seagull Optimization Algorithm (ISOA)

In this part, in order to avoid the SOA algorithm falling into a local minimum and to accelerate the convergence speed of the SOA algorithm in the optimization process, the SOA algorithm is improved by three aspects.
(1).
As we know, a good optimization algorithm should have strong exploration ability in the early iteration and good development ability in the late iteration. In addition, compared with a linear function, a nonlinear function can achieve a faster global search in the early stage and a slower local search in the later stage. Therefore, in the seagull optimization algorithm, we change the parameter B , which controls the seagull towards the optimal position, from a linear function to an exponential function.
B = e A b r a n d
(2).
From the SOA algorithm, we can see that the new position of the seagulls is mainly affected by the global optimal position, but when we observe the foraging of seagulls, we can find that the flight of seagulls is also affected by the individual optimal value of seagulls. In order to solve this problem, we consider the influence of the particle itself, and introduce the adaptive variable weight parameter ω .
ω = ω max ( ω max ω min ) ( f i t p b e s t f i t g b e s t a v g f i t f i t g b e s t ) t T , f i t a v g f i t ω max , f i t > a v g f i t
where, ω max and ω min represent the maximum and minimum values of weight ω , respectively, f i t p b e s t represents the individual optimal value of particle fitness, the global optimal value of particle fitness is defined as f i t g b e s t , and a v g f i t is the average fitness value of the current particle population. t and T are the current iteration and the maximum number of iterations, respectively.
It can be seen from Equation (17) that when the target values of the particles tend to be consistent or the location optimal, the weight ω is larger, whereas when the target values of the particles are dispersed, the weight ω is smaller. For a particle with an objective function value better than the average target value, the corresponding weight ω is smaller, so that the particle is retained. On the contrary, for a particle with an objective function value worse than the average target value, the corresponding weight ω is larger, so that the particle moves closer to the better search area.
(3).
Increase the diversity of particles to avoid the algorithm falling into a local minimum. First, the fitness of particles is sorted from small to large. Then, the particles with poor ranking generate a new position near the global optimal position, while the other particles are retained.
To summarize, the attack position P i of the i t h seagull follows:
P i = ( 1 ω ) x i y i z i D i + ω P b e s t , i < p o p s i z e 0.8 ( 1 λ ) P b e s t + λ r p , o t h e r .
where i is the new serial number of particles after fitness sorting, population size is defined as p o p s i z e , λ is a constant and r p is a random value in the range of (−1 to 1).

3.3. Proposed ISOA-SVR Prediction Model

In this study, the ISOA algorithm was used to optimize the penalty parameter c , width parameter g and loss parameter p in SVR. The mean square error (MSE) is the fitness function of ISOA algorithm. The fitness function of the kth training sample is defined by the following Equation:
M S E = 1 n i = 1 n ( y i y ^ i )
where n is the number of samples and y ^ i and y i are the actual and the forecasted return AQI value, respectively.
The algorithm flow of the proposed ISOA-SVR prediction model is as follows:
Step 1:
Set the control parameters of the proposed ISOA algorithm.
Step 2:
Initial population P .
Step 3:
Map P into the c , g and p of the SVR and calculate its fitness evaluation f by use of Equation (19), then f i t p b e s t , f i t g b e s t and a v g f i t are calculated.
Step 4:
Update weight parameters ω by Equation (17).
Step 5:
Update C by Equations (6) and (7).
Step 6:
Update M by Equations (9) and (16).
Step 7:
Update D by Equation (10).
Step 8:
Update attack positions P according to Equations (11)–(15).
Step 9:
The fitness values f are sorted from small to large. Update the attack position P according to Equation (18).
Step 10:
If the stopping criterion is met, then go to Step 11. Otherwise, go to Step 3.
Step 11:
Output the best positions that is mapped into the c , g and p of the SVR.
Then train and test the SVR.

4. Experimental Preparation

4.1. Data Description and Preprocessing

The data used in this study comes from the historical data of Taiyuan city released by the historical data query of China air quality online monitoring and analysis platform (https://www.aqistudy.cn/historydata/ accessed on 15 July 2020). The daily and monthly data of air quality index (AQI) and six closely related pollutants (PM2.5, PM10, SO2, CO, NO2, O3) in Taiyuan from December 2013 to June 2020 were extracted. The daily data is the average result of the real-time dynamic data released by the environmental protection station on that day, and the monthly data is the average value of all the daily data of that month.
Since the difference of the numerical dimensions of the indexes involved in the calculation of grey relational degree and AQI prediction, this study carried out dimensionless processing on the data in the grey relational analysis (GRA) and AQI prediction. When calculating the grey relational degree, we used the mean method of Equation (20) to realize the dimensionless processing of the data. In addition, to realize dimensionless data in AQI predictions, we used Equation (21) to normalize the data to the range of (0–1).
x n e w = x x ¯
where x n e w is the dimensionless value of x and x ¯ is the mean value of the sample during the experimental period.
x n o r m = x x min x max x min
where x n o r m is the normalized value and x min and x max represent the minimum and maximum values of experimental data in each experiment.

4.2. Experimental Environment

The experimental environment was as follows. MATLAB 2016b (MathWorks, Natick, MA, USA) running Windows10 (Microsoft, Redmond, WA, USA) on an Intel Core i7-8700 CPU (Intel, Santa Clara, CA, USA) with a 3.20 GHz processor and 8 GB of RAM. In addition, all of the algorithms in the experiments ran independently for 30 times under different randomly initialized populations. To verify the efficiency of the proposed ISOA, the proposed ISOA was compared with SOA, PSO, SCA and WOA. The maximum number of iterations T was set to 200 for all algorithms and the population size N was set to 20. The other parameters of SCA were similar to those in [27], the WOA parameters were similar to those in [28] and the SOA parameters were similar to those in [25]. For PSO, acceleration coefficients c1 and c2 were set to 1.49445.

4.3. Evaluation of Prediction Effect

In this study, ISOA-SVR, SOA-SVR, PSO-SVR, SCA-SVR and WOA-SVR were used to establish the prediction model of AQI in Taiyuan. Mean square error (MSE), relative mean square error (RMSE) and mean percentage error (MAPE) were the evaluation indexes of the above five models. In order to get an excellent regression model, the smaller the MSE, RMSE and MAPE between the predicted value and the actual value, the better. The mathematical expression of evaluation index is shown in Equations (22)–(24) where y i is the label of the ith sample and y ^ i is the predicted value of the ith sample.
M S E = 1 n i = 1 n ( y i y ^ i ) 2 ,
R M S E = 1 n i = 1 n ( y i y ^ i y ^ i ) 2 ,
M A P E = 1 n i = 1 n y i y ^ i y ^ i × 100 %

5. Experimental Results

In this section, GRA was used evaluate and analyze the air quality of Taiyuan city before and during the outbreak of COVID-19. Then, a prediction model of AQI in Taiyuan was established by using ISOA-SVR and compared with other predicted models.

5.1. Analysis of Air Pollution Assessment Results

Before applying the GRA method to analyze the influence of air pollution factors on AQI, a simple numerical statistical analysis of AQI and each air pollution factor in four time periods was carried out. The four time periods mentioned here include the COVID-19 locking period (25 January 2020 to 24 February 2020), before the outbreak of COVID-19 (25 December 2019 to 24 January 2020), the same period in 2019 (25 January 2019 to 24 February 2019) and one other time period (25 December 2018 to 24 January 2019), each of which has 31 days. In addition, according to the technical regulation of ambient air quality index (AQI) (HJ 1130-2020) issued by the China Ministry of Environment of Protection, AQI can be divided into six levels (0~50 ‘good’, 51~100 ‘moderate’, 101~150 ‘lightly polluted’, 151~200 ‘moderately polluted’, 201~300 ‘heavily polluted’ and >300 ‘severely polluted’) [29]. This paper takes these as the standards to analyze the air pollution in the above four periods. The statistical results of AQI are shown in Figure 3 and Table 1. In the 31 days of the COVID-19 locking period (25 January 2020 to 24 February 2020), there were three days of good, 16 days of moderate, nine days of lightly polluted and three days of heavily polluted air quality, and there was no moderately polluted or severely polluted weather. In the 31 days before the outbreak of COVID-19 (25 December 2019 to 24 January 2020), respectively, and another day with severely polluted air quality the days with good and moderate grades were two and seven, respectively, the days with lightly polluted, moderately polluted and heavily polluted were 11, two and eight. In the same period of 2019 (25 January 2019 to 24 February 2019), the days of good and moderate were one and 13 respectively, the days of lightly polluted, moderately polluted and heavily polluted were six, eight and three respectively, and there was no severely polluted weather. In the fourth period (25 December 2018 to 24 January 2019), the days of good and moderate weather were zero and 11 respectively, the days of lightly polluted were 10 days, the days of moderately polluted were four days, the days of heavily polluted were four days and the days of severely polluted were two days. From these statistics, it can be seen that the air quality of the COVID-19 locking period (25 January 2020 to 24 February 2020) was significantly better than the other three periods. In fact, during the period of COVID-19 lock-in, the total number of days with good and moderate air quality was 19 days, accounting for 61.3% (19/31), and there was no severely polluted weather. However, in the other three time periods, the total number of good and moderate days of air quality were 9, 14 and 11, respectively, which did not reach 50%. In the locking period of COVID-19 (25 January 2020 to 24 February 2020), only three days reached or exceeded the moderately polluted level, accounting for 9.7% (3/31), and the other three time periods were 11 days, 11 days and 10 days, respectively, which exceeded 30%. In addition, before the outbreak of COVID-19 (25 December 2019 to 24 January 2020), and in the fourth period (25 December 2018 to 24 January 2019), there were one day and two days of severely polluted weather, respectively. All these indicate that during the period of epidemic prevention and control, the factories stopped production and motor vehicles were restricted and stopped, which played a positive role in improving air quality.
Figure 4 and Table 2 show the average values of AQI and pollutant concentrations over the above four time periods. It can be seen that compared with the other three periods, except for the increase of the average concentration of O3, the average value of AQI and the average concentration of other pollutants decreased significantly during the COVID-19 locking period (25 January 2020 to 24 February 2020). The average value reflects the overall level of data. Combining these results with the statistical results in Table 1 shows that the air quality in the locking period of COVID-19 was significantly better than in the other three periods. In fact, good weather accounted for 61.3% (19/31) in the COVID-19 locking period (25 January 2020 to 24 February 2020). Compared with 29% (9/31) before the outbreak of COVID-19, 45.1% (14/31) in the same period of 2019 and 35.5% (11/31) in the fourth period, there were 32.3% (10/31), 16.2% (5/31) and 25.8% (8/31) more, respectively. The number of days reaching and exceeding the moderately polluted level accounted for 9.7% (3/31), which was 25.8% (8/31), 25.8% (8/31) and 22.6% (7/31) less than that before the outbreak of COVID-19 (35.5% (11/31)) for the same period in 2019 (35.5% (11/31)) and the fourth period (32.2% (10/31)) respectively. There was no severely polluted weather during the period of COVID-19 lock-in. In addition, the average value of AQI during the locking period of COVID-19 was 102, which is smaller than the other three time periods (141, 122, 141). These results show that the prevention and control measures of COVID-19 had great restrictions on the movement of people, transportation, engineering construction, industrial production and commercial trade activities. Industrial emissions and automobile exhaust were greatly reduced, and air quality was significantly improved.
From the literature [29], we know that AQI is calculated by the concentration values of PM2.5, PM10, SO2, CO, NO2, O3 and other pollutants. From previous numerical statistical analysis, we know that compared with the other three time periods the AQI and the pollution concentration changed greatly during the COVID-19 locking period. So, did the impact of these air pollutants on AQI change greatly? Next, we used the GRA method to analyze the gray relational degree of each pollution factor and AQI in the above four time periods, and made a comparative analysis of the situation in different time periods. The analysis and comparison results are shown in Figure 5 and Table 3, Table 4, Table 5 and Table 6.
It can be seen from Table 3 and Table 4 that during the COVID-19 locking period (25 January 2020 to 24 February 2020), compared with before the outbreak of COVID-19 (25 December 2019 to 24 January 2020), the impact of PM10 was slightly enhanced, the impact of PM2.5, SO2 and CO was basically the same as that in the earlier period, the impact of NO2 was slightly weakened and the impact of O3 was significantly enhanced. Compared with the same period in 2019 (25 January 2019 to 24 February 2019), the impacts of PM10 and O3 were significantly enhanced, the impacts of PM2.5 and CO were basically the same as before and the impacts of SO2 and NO2 were slightly weakened.
The results in Table 5 show that in the same period of 2019 (25 January 2019 to 24 February 2019) compared with its earlier period (25 December 2018 to 24 January 2019), the impact of PM2.5, SO2 and NO2 was significantly weakened, the impact of PM10 was basically the same as that of the earlier period, the impact of CO was slightly weakened and the impact of O3 was slightly enhanced. It can be seen from Table 6 that before the outbreak of COVID-19 (25 December 2019 to 24 January 2020), compared with the same period in 2018 (25 December 2018 to 24 January 2019), the impact of PM2.5, SO2 and NO2 was significantly weakened, the impact of PM10 was significantly enhanced and the impact of CO and O3 was significantly enhanced.
Based on the analysis of the results in Table 3, Table 4, Table 5 and Table 6 and Figure 5, it can be seen that although there were some differences in the impact changes of pollutants on air quality (AQI) in different comparison periods, the changes were similar in the corresponding comparison periods. PM10 and PM2.5 had great influence on AQI. These results indicate that although the impact of various air pollutants on AQI had some changes due to the outbreak of COVID-19, most of the changes were not significant. Therefore, it is feasible to use the data before the outbreak of COVID-19 to predict the data during the outbreak of COVID-19, but there are also some challenges.

5.2. AQI Prediction Results and Analysis

In this part, we used five prediction models, such as ISOA-SVR, SOA-SVR, PSO-SVR, SCA-SVR and WOA-SVR, to predict the monthly average AQI values from September 2019 to June 2020, and the daily AQI values for 15–24 February 2020. In addition, in order to ensure the reliability of the experiment, each group of experiments of each model was run separately for 30 times, and the prediction results compared and analyzed.
The sampling time of Experiment 1 was from December 2013 to June 2020, with a total of 79 groups of monthly average air quality data. The first 69 groups were used as the training set and the remaining 10 groups as the test set. The sampling time of Experiment 2 was from 1 January 2020 to 24 February 2020. There were 55 groups of daily air quality data, including the first 45 groups as the training set and the remaining 10 groups as the test set. In the experiment, the AQI value of the sixth group was predicted by inputting the AQI value of the first five groups and the pollutant value of the fifth group, and then sliding forward was carried out to eliminate the random fluctuation in the historical statistical series, and to eliminate the influence of season and random factors to make more accurate predictions. The fitness convergence curve of training data and the test results of test data (including MSE, RMSE, MAPE and run time) were obtained by running each prediction model.

5.2.1. Results and Analysis of Experiment 1

As can be seen from Figure 6, the convergence effect of fitness functions of ISOA, SCA and WOA were similar, but SOA was slightly worse, and the worst was PSO. The training results in Table 7 show that ISOA-SVR was the best, MSE, RMSE and MAPE were 1.299370396, 0.010565973 and 9.064760441, SCA-SVR was 1.386464339, 0.012714389 and 9.62641542, and WOA-SVR were 1.367865788, 0.012364791 and 9.463121198, respectively. From the evaluation index values of the prediction results in Table 8, we found that the three index values of the prediction results of ISOA-SVR were the best, which were 3.379206115, 0.03090546 and 14.03810752, respectively. WOA-SVR was slightly worse than the ISOA-SVR algorithm, SOA-SVR and SCA-SVR were similar, both slightly lower than WOA-SVR, and PSO-SVR was the worst. In fact, we found from the prediction results in Table 9 that the prediction result of ISOA-SVR achieved the optimal value or suboptimal value in 6/10, which was the best among the five algorithms.

5.2.2. Results and Analysis of Experiment 2

We can see from Figure 7 that the convergence effect of fitness function of ISOA, SOA, SCA and WOA were similar, but the convergence effect of PSO was a little worse. From the MSE, RMSE and MAPE of the training results in Table 10, we found that the training results of ISOA-SVR (13.37973604, 0.163687428 and 28.1740733) were slightly worse than the best SOA-SVR (12.90917741, 0.162034198 and 28.71513087), but better than other algorithms. From the three evaluation index values of the prediction results in Table 11, we can see that the prediction results of ISOA-SVR were the best (6.147418336, 0.185148465 and 37.28176518), SOA-SVR was slightly worse than the ISOA-SVR algorithm, SCA-SVR was better than WOA-SVR and was slightly lower than SOA-SVR; the worst was PSO-SVR. In fact, from the prediction results in Table 12, we found that the prediction result of ISOA-SVR achieved the optimal value or suboptimal value in 7/10, which was the best among the five algorithms.
Combined with the results of Experiment 1 and Experiment 2, we can conclude that the proposed ISOA-SVR prediction model had better generalization ability and robustness than other prediction models tested in this study. This good result is attributed to the three characteristics of the ISOA algorithm: one is the exponential balance of local and global optimization; the second is the adaptive ability of particles themselves and groups and the third is the diversity of particles. These make the algorithm avoid falling into local minimums in the process of optimization. In addition, the weight parameters introduced in the ISOA algorithm can balance the global exploration and local development ability of particles and accelerate the convergence speed of the algorithm.

6. Conclusions

This study focused on air quality evaluation during the COVID-19 lock down (25 January 2020 to 24 February 2020), before the COVID-19 outbreak (25 December 2019 to 24 January 2020), the same period in 2019 (25 January 2019 to 24 February 2019) and another time period (25 December 2018 to 24 January 2019). First, a numerical statistics comparative analysis was made, and then the grey relational analysis (GRA) method was used to analyze the relational degree between six major pollutants and AQI. The results not only reflect the distribution of air quality index (AQI) and pollutant concentration in each analysis period, but also reflect the impact of pollutants on AQI. The air condition during the locking period of COVID-19 was evaluated. The results show that during the locking period of COVID-19, the overall air condition was good, good and moderate weather accounted for a large proportion of the period, and there was no severely polluted weather. The average air quality index (AQI) and the average concentrations of five pollutants except O3 decreased significantly. PM10 was still the main factor affecting AQI. This indicates that air pollution was closely related to the higher pollutant discharge. The prevention and control measures during the COVID-19 locking period reduced the emissions of industrial waste gas, automobile exhaust and other pollutants in Taiyuan. Taiyuan greatly improved its air. However, coal-fired heating in winter still contributes a lot to PM10 concentration. In addition, the evaluation results also reflect that the average concentration of O3 increased slightly during the period of prevention and control of COVID-19, which deserves our attention.
After the evaluation of air quality, this study constructed an ISOA-SVR prediction model for the prediction of AQI value of air quality. Air quality is affected by many factors such as pollution sources and the meteorological environment, which makes AQI prediction inaccurate and difficult. SVR is suitable for AQI prediction, but its prediction effect depends on the selection of model parameters. First, based on the SOA algorithm, this study proposed the ISOA algorithm by introducing nonlinear parameters, considering the influence of individual extreme and average adaptations of each generation, and increasing the diversity of particles. Then, the parameters of the SVR model were optimized by the ISOA algorithm. Finally, the optimized ISOA-SVR prediction model was used to analyze the monthly average AQI values from September 2019 to June 2020, and the daily AQI values for 15–24 February 2020, respectively. The results were compared with those of the SVR prediction models optimized by the SOA, PSO, SCA and WOA algorithms. Finally, the results showed that the SVR model optimized by the ISOA algorithm achieved good results in MSE, RMSE and MAPE.
To sum up, first of all, the air quality evaluation results of Taiyuan in this study were in line with the air quality situation of Taiyuan city. Secondly, the integrated prediction model of ISOA-SVR proposed in this study was suitable for the prediction of AQI value of air quality in Taiyuan before and after COVID-19 epidemic. These results can play a guiding role in the prevention and control of air quality in Taiyuan and other places.

Author Contributions

Data collection and experimental works: T.X. Writing, discussion, analysis: T.X., H.Y. and Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the National Nature Science Foundation of China (Grant No. 61774137, 51875535 and 61927807), the Shanxi Natural Science Foundation (Grant No. 201801D121026) and Shanxi Scholarship Council of China (Grant No. 2020-104).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in the historical data query of China air quality online monitoring and analysis platform (https://www.aqistudy.cn/historydata/, accessed on 1 March 2021).

Acknowledgments

This work was financially supported by the National Nature Science Foundation of China (Grant No. 61774137, 51875535 and 61927807). The authors also thank the Shanxi Natural Science Foundation (Grant No. 201801D121026) and Shanxi Scholarship Council of China (Grant No. 2020-104) for their funded.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Guan, W.-J.; Zheng, X.-Y.; Chung, K.F.; Zhong, N.-S. Impact of air pollution on the burden of chronic respiratory diseases in China: Time for urgent action. Lancet 2016, 388, 1939–1951. [Google Scholar] [CrossRef]
  2. Trenton Straube. Top 10 Threats to Global Health in 2019. 13 February 2019. Available online: https://www.poz.com/article/top-10-threats-global-health-2019 (accessed on 6 January 2021).
  3. Kyrkilis, G.; Chaloulakou, A.; Kassomenos, P.A. Development of an aggregate air quality index for an urban Mediterranean agglomeration: Relation to potential health effects. Environ. Int. 2007, 33, 670–676. [Google Scholar] [CrossRef]
  4. China Ministry of Environmental Protection. Ambient Air Quality Standards; GB 3095–2012; China Environmental Science Press: Beijing, China, 2012.
  5. Siew, L.Y.; Chin, L.Y.; Mah, P.; Wee, J. Arima and integrated Arfima models for forecasting air pollution index in shah alam, Selangor. Malays. J. Anal. Sci. 2008, 12, 257–263. [Google Scholar]
  6. Zhu, J.; Zhang, R.; Fu, B.; Jin, R. Comparison of ARIMA model and exponential smoothing model on 2014 air quality index in yanqing county, Beijing, China. Appl. Comput. Math. 2015, 4, 456. [Google Scholar] [CrossRef] [Green Version]
  7. Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1997, 28, 779–784. [Google Scholar]
  8. Basak, D.; Pal, S.; Patranabis, D.C. Support vector regression. Neural Inf. Process. Lett. Rev. 2007, 11, 203–224. [Google Scholar]
  9. Chen, S.-T.; Yu, P.-S.; Tang, Y.-H. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol. 2010, 385, 13–22. [Google Scholar] [CrossRef]
  10. Kavousi-Fard, A.; Kavousi-Fard, F. A new hybrid correction method for short-term load forecasting based on ARIMA, SVR and CSA. J. Exp. Theor. Artif. Intell. 2013, 25, 559–574. [Google Scholar] [CrossRef]
  11. Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
  12. Liu, H.; Li, Q.; Yu, D.; Yu, G. Air quality index and air pollutant concentration prediction based on machine learning algo-rithms. Appl. Sci. 2019, 9, 4069. [Google Scholar] [CrossRef] [Green Version]
  13. Castelli, M.; Clemente, F.M.; Popovič, A.; Silva, S.; Vanneschi, L. A machine learning approach to predict air quality in California. Complexity 2020, 2020, 1–23. [Google Scholar] [CrossRef]
  14. Dun, M.; Xu, Z.; Chen, Y.; Wu, L. Short-term air quality prediction based on fractional grey linear regression and support vector machine. Math. Probl. Eng. 2020, 2020, 1–13. [Google Scholar] [CrossRef]
  15. Chen, K.-Y.; Wang, C.-H. Support vector regression with genetic algorithms in forecasting tourism demand. Tour. Manag. 2007, 28, 215–226. [Google Scholar] [CrossRef]
  16. Wu, C.-H.; Tzeng, G.-H.; Lin, R.-H. A novel hybrid genetic algorithm for kernel function and parameter optimization in support vector regression. Expert Syst. Appl. 2009, 36, 4725–4735. [Google Scholar] [CrossRef]
  17. Wang, J.; Li, L.; Niu, D.; Tan, Z. An annual load forecasting model based on support vector regression with differential evolution algorithm. Appl. Energy 2012, 94, 65–70. [Google Scholar] [CrossRef]
  18. Zhang, F.; Deb, C.; Lee, S.E.; Yang, J.; Shah, K.W. Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique. Energy Build. 2016, 126, 94–103. [Google Scholar] [CrossRef]
  19. Hong, W.-C. Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model. Energy Convers. Manag. 2009, 50, 105–117. [Google Scholar] [CrossRef]
  20. Duan, P.; Xie, K.; Guo, T.; Huang, X. Short-term load forecasting for electric power systems using the PSO-SVR and FCM clustering techniques. Energies 2011, 4, 173–184. [Google Scholar] [CrossRef] [Green Version]
  21. Bui, X.-N.; Lee, C.W.; Nguyen, H.; Bui, H.-B.; Long, N.Q.; Le, Q.-T.; Nguyen, V.-D.; Nguyen, N.-B.; Moayedi, H. Estimating PM10 concentration from drilling operations in open-pit mines using an assembly of SVR and PSO. Appl. Sci. 2019, 9, 2806. [Google Scholar] [CrossRef] [Green Version]
  22. Cao, G.; Wu, H. Support vector regression with fruit fly optimization algorithm for seasonal electricity consumption forecasting. Energy 2016, 115, 734–745. [Google Scholar] [CrossRef]
  23. Xu, C.; Amar, M.N.; Ghriga, M.A. Evolving support vector regression using grey wolf optimization; forecasting the geome-chanically properties of rock. Eng. Comput. 2020, 2020, 1–15. [Google Scholar]
  24. Deng, J.L. Introduction to grey system theory. J. Grey Syst. 1989, 1, 1–24. [Google Scholar]
  25. Dhiman, G.; Kumar, V. Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems. Knowl. Based Syst. 2019, 165, 169–196. [Google Scholar] [CrossRef]
  26. Eberhart, R.C.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the 6th International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; Volume 1, pp. 39–43. [Google Scholar]
  27. Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl. Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
  28. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  29. China Ministry of Environmental Protection. Technical Regulation on Ambient Air Quality Index (AQI); HJ 633–2012; China Environmental Science Press: Beijing, China, 2012.
Figure 1. A schematic diagram of support vector regression (SVR).
Figure 1. A schematic diagram of support vector regression (SVR).
Atmosphere 12 00336 g001
Figure 2. Migration and attack pattern of seagulls.
Figure 2. Migration and attack pattern of seagulls.
Atmosphere 12 00336 g002
Figure 3. The proportion of days in each band of air quality index (AQI) in different periods.
Figure 3. The proportion of days in each band of air quality index (AQI) in different periods.
Atmosphere 12 00336 g003
Figure 4. The average values of AQI and six main pollution concentrations in different periods of time.
Figure 4. The average values of AQI and six main pollution concentrations in different periods of time.
Atmosphere 12 00336 g004
Figure 5. Comparison of the gray relational degree between pollutants and AQI in different periods of time.
Figure 5. Comparison of the gray relational degree between pollutants and AQI in different periods of time.
Atmosphere 12 00336 g005
Figure 6. Fitness convergence curves of five algorithms in Experiment 1.
Figure 6. Fitness convergence curves of five algorithms in Experiment 1.
Atmosphere 12 00336 g006
Figure 7. Fitness convergence curves of five algorithms in Experiment 2.
Figure 7. Fitness convergence curves of five algorithms in Experiment 2.
Atmosphere 12 00336 g007
Table 1. The days of each band of AQI in different periods.
Table 1. The days of each band of AQI in different periods.
TimeGoodModerateLightly PollutedModerately PollutedHeavily PollutedSeverely PollutedTotal
25 December 2018–24 January 20190111044231
25 January 2019–24 February 2019113683031
25 December 2019–24 January 2020271128131
25 January 2020–24 February 2020316903031
Table 2. The average values of AQI and six main pollution concentrations in different periods of time.
Table 2. The average values of AQI and six main pollution concentrations in different periods of time.
TimeAQIPM2.5PM10SO2CONO2O3
25 December 2018–24 January 2019141102184561.97430
25 January 2019–24 February 201912286135441.45060
25 December 2019–24 January 2020141105146331.56143
25 January 2020–24 February 202010273102271.03680
Table 3. The gray relational degree between pollutants and AQI between 25 December 2019 to 24 January 2020 and 25 January 2020 to 24 February 2020.
Table 3. The gray relational degree between pollutants and AQI between 25 December 2019 to 24 January 2020 and 25 January 2020 to 24 February 2020.
TimePM2.5PM10SO2CONO2O3
25 December 2019–24 January 20200.59461820.80364690.36001140.29167060.4561540.4247363
25 January 2020–24 February 20200.58256370.82903490.37726210.2971430.42107220.5905432
Table 4. The gray relational degree between pollutants and AQI between 25 January 2019–24 February 2019 and 25 January 2020–24 February 2020.
Table 4. The gray relational degree between pollutants and AQI between 25 January 2019–24 February 2019 and 25 January 2020–24 February 2020.
TimePM2.5PM10SO2CONO2O3
25 January 2019–24 February 20190.58157870.73847250.4194460.30335360.4536560.5344895
25 January 2020–24 February 20200.58256370.82903490.37726210.2971430.42107220.5905432
Table 5. The gray relational degree between pollutants and AQI between 25 December 2018 to 24 January 2019 and 25 January 2019 to 24 February 2019.
Table 5. The gray relational degree between pollutants and AQI between 25 December 2018 to 24 January 2019 and 25 January 2019 to 24 February 2019.
TimePM2.5PM10SO2CONO2O3
25 December 2018–24 January 20190.75730020.7318050.54993090.37513510.66690780.4942023
25 January 2019–24 February 20190.58157870.73847250.4194460.30335360.4536560.5344895
Table 6. The gray relational degree between pollutants and AQI between 25 December 2018 to 24 January 2019 and 25 December 2019 to 24 January 2020.
Table 6. The gray relational degree between pollutants and AQI between 25 December 2018 to 24 January 2019 and 25 December 2019 to 24 January 2020.
TimePM2.5PM10SO2CONO2O3
25 December 2018–24 January 20190.75730020.7318050.54993090.37513510.66690780.4942023
25 December 2019–24 January 20200.59461820.80364690.36001140.29167060.4561540.4247363
Table 7. The means square error (MSE), relative mean square error (RMSE) and mean percentage error (MAPE) index results of five models in Experiment 1.
Table 7. The means square error (MSE), relative mean square error (RMSE) and mean percentage error (MAPE) index results of five models in Experiment 1.
ISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
MSE1.2993703961.5162756531.7831847451.3864643391.367865788
RMSE0.0105659730.0141429110.0148337240.0127143890.012364791
MAPE9.0647604419.8532003829.3716379239.626415429.463121198
Table 8. The results of MSE, RMSE and MAPE of five models in Experiment 1.
Table 8. The results of MSE, RMSE and MAPE of five models in Experiment 1.
ISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
MSE3.3792061153.4419862224.3492836133.4981509563.41400161
RMSE0.030905460.032306730.0382315750.0306726790.031900138
MAPE14.0381075215.6027727518.0271899314.6585073214.31309124
Table 9. The predicted value of the monthly average AQI index of Taiyuan in 10 months (September 2019 to June 2020) by five models.
Table 9. The predicted value of the monthly average AQI index of Taiyuan in 10 months (September 2019 to June 2020) by five models.
MonthsActualISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
September 20198082.7719805585.597724688.2862758384.4819410682.38016393
October 20198691.7539042792.79561765104.425421195.0353534491.52332133
November 2019102108.1564331107.1279498113.0787421109.4848466108.7885621
December 201995129.8122227124.8458292120.8050922129.2580647130.9492776
January 2020153120.1545213119.9753754109.4081433118.2433733120.5372071
February 2020108128.1305829127.0634554127.8660077128.9146531127.4201454
March 20207795.4005633999.4473900992.5351963993.9611342795.63228132
April 20208394.2069409595.92648075101.520853289.1299659695.75164039
May 20208888.3615965294.17246856104.196051583.8578551190.30725226
June 2020123111.1573034106.1393318119.698614109.623607113.3392755
Table 10. The MSE, RMSE and MAPE index results of five models in Experiment 2.
Table 10. The MSE, RMSE and MAPE index results of five models in Experiment 2.
ISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
MSE13.3797360412.9091774115.2606910513.3822863513.67762998
RMSE0.1636874280.1620341980.2633422020.1649325620.176975593
MAPE28.174073328.7151308736.0695044128.3430794630.06256388
Table 11. The results of MSE, RMSE and MAPE of five models in Experiment 2.
Table 11. The results of MSE, RMSE and MAPE of five models in Experiment 2.
ISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
MSE6.1474183366.45977860110.824998186.6895662617.5729638
RMSE0.1851484650.1854703580.4169011530.2012638880.259079722
MAPE37.2817651837.7318229653.098946339.0138356443.56969294
Table 12. The predicted value of the monthly average AQI index of Taiyuan in 10 days (15 February 2020 to 24 February 2020) by five models.
Table 12. The predicted value of the monthly average AQI index of Taiyuan in 10 days (15 February 2020 to 24 February 2020) by five models.
DaysActualISOA-SVRSOA-SVRPSO-SVRSCA-SVRWOA-SVR
15 February 20203766.1991549864.171609281.1118573666.7755942668.29006364
16 February 20204063.5638434963.1111418878.8518615964.5687366273.61728757
17 February 20205064.3155015665.5330204677.7239286966.3012746873.6344107
18 February 20206483.4531770385.2229048194.4841488186.0240353392.64726525
19 February 20207798.39158683101.4637515107.20132101.8584487104.1777717
20 February 2020107102.5547977104.7637718113.8703628104.7694695104.8558503
21 February 202067109.1720281111.3754232122.0536657110.3169339109.5317368
22 February 20205676.3007806776.2792180392.0726500876.8742718377.86190836
23 February 20209977.9982597577.1017549293.350699678.3403218485.4347774
24 February 2020137104.6398589104.8715531118.3346012104.0343253107.4976793
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, T.; Yan, H.; Bai, Y. Air Pollutant Analysis and AQI Prediction Based on GRA and Improved SOA-SVR by Considering COVID-19. Atmosphere 2021, 12, 336. https://doi.org/10.3390/atmos12030336

AMA Style

Xu T, Yan H, Bai Y. Air Pollutant Analysis and AQI Prediction Based on GRA and Improved SOA-SVR by Considering COVID-19. Atmosphere. 2021; 12(3):336. https://doi.org/10.3390/atmos12030336

Chicago/Turabian Style

Xu, Ting, Huichao Yan, and Yanping Bai. 2021. "Air Pollutant Analysis and AQI Prediction Based on GRA and Improved SOA-SVR by Considering COVID-19" Atmosphere 12, no. 3: 336. https://doi.org/10.3390/atmos12030336

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop