Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 91-99.doi: 10.6040/j.issn.1672-3961.0.2019.404

• Machine Learning & Data Mining • Previous Articles     Next Articles

Air quality prediction approach based on integrating forecasting dataset

Minghe GAO1(),Ying ZHANG1,*(),Rongrong ZHANG1,Zihao HUANG1,Linyan HUANG1,Fanyu LI1,Xin ZHANG2,Yanhao WANG1   

  1. 1. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
    2. School of Computer Science and Technology, Changchun University of Science and Technology, Jilin 130022, China
  • Received:2019-07-18 Online:2020-04-20 Published:2020-04-16
  • Contact: Ying ZHANG E-mail:1010619625@qq.com;dearzppzpp@163.com
  • Supported by:
    中央高校基本科研业务费专项资金(2018MS024);国家自然科学基金资助项目(61305056);吉林省科技发展计划项目(20190303133SF)

Abstract:

Towarding the air quality prediction research problem, LightGBM was employed to propose and design a predictive feature-based air quality prediction approach, which could effectively predict the PM2.5 concentration, i.e., the key indicator reflecting air quality, in the upcoming 24-hour within Beijing. During constructing the prediction solution, the features of the training data set was analyzed to execute data cleansing, and the methods of random forest and linear interpolation were used to solve the problem of high data loss and noise interference. The predictive data features were integrated into the dataset, and meanwhile the corresponding statistical features were designed to imiprove the prediction accurancy. The sliding window mechanism was used to mine high-dimensional time features and increase the quantity of data features. The performance and result of the proposed approach were analyzed in details through comparing with the basedline models. The experimental results showed that compared with other model methods, the proposed LightGBM-based prediction approach with integrating forecasting data had higher prediction accuracy.

Key words: predictive data fusion, high dimensional statistical features, air quality prediction, machine learning

CLC Number: 

  • TP18

Fig.1

PM2.5 concentration trends within two days"

Fig.2

Variation of PM2.5 concentration at three stations"

Table 1

Data missing condition"

参数 缺失数据量/条 缺失比例/%
PM10 83 263 26.771 8
CO 42 813 13.765 8
O3 20 421 6.566 0
PM2.5 20 389 6.555 7
NO2 18 651 5.996 9
SO2 18 548 5.963 8

Fig.3

Distribution of grid nodes and stations"

Fig.4

Data fusion process"

Fig.5

Principle of sliding window"

Table 2

Feature list"

特征 名称 特征描述
时间特征 day_of_week 一周内日期, 1:星期一, 2:星期二, …7:星期日
day_of_hour 一天内时刻, 00:00—23:00时
day_of_month 一月内日期序列, 0:1日, 1:2日, …, 30:31日
isweekend 是否为双休日
hour_to_predict 要预测的时间, 0~24 h
CO_1, …, CO_144 历史144 h CO特征
h_temperature_1, …, h_temperature_144 历史144 h气温特征
气象特征 humity 给定时间内的湿度
weather 给定时间内的天气状况
temperature 给定时间内的温度
wind_direction 给定时间内的风向
wind_speed 给定时间内的风速
pressure 给定时间内的气压
空气质量特征 CO 给定时间内的CO质量浓度
PM10 给定时间内的PM10质量浓度
NO2 给定时间内的NO2质量浓度
O3 给定时间内的O3质量浓度
SO2 给定时间内的SO2质量浓度
天气预报特征 temperature_1, …, temperature_24 未来24 h温度
humity_1, …, humity_24 未来24 h湿度
pressure_1, …, pressure_24 未来24 h气压
weather_1, …, weather_24 未来24 h天气状况
wind_direction_1, …, wind_direction24 未来24 h风向
wind_speed_1, …, wind_speed_24 未来24 h风速
统计特征 mean_pm25\PM10\O3_1, mean_pm25\PM10\O3_3, mean_pm25\PM10\O3_5 前1、3、5 d PM2.5\PM10\O3平均质量浓度
max_pm25\PM10\O3_1, max_pm25\PM10\O3_3, max_pm25\PM10\O3_5 前1、3、5 d PM2.5\PM10\O3最大质量浓度
min_pm25\PM10\O3_1, min_pm25\PM10\O3_3, min_pm25\PM10\O3_5 前1、3、5 d的PM2.5\PM10\O3最小质量浓度
pm25_13\O3_13\pm10_13 PM2.5\O3\PM10前1 d与前3 d平均质量浓度比值
pm25_35\O3_35\pm10_35 PM2.5\O3\PM10前3 d与前5 d平均质量浓度比值

Fig.6

Daily average concentration variation of PM2.5 during different days"

Fig.7

PM2.5 concentration variation under different weathers"

Fig.8

Heat map of the interrelated features"

Table 3

Statistical table of air quality dataset μg/m3"

项目 PM2.5 PM10 NO2 CO O3 SO2
平均 58.8 88.1 45.8 1.0 55.7 8.98
标准 66.1 89.3 32.1 1.0 53.8 11.7
最小 2.0 5.0 1.0 0.1 1.0 1.0
25% 16.0 37.0 20.0 0.4 29.1 2.0
50% 39.0 70.0 39.0 0.7 45.0 5.0
75% 77.0 113.0 66.0 1.2 79.0 11.0
最大 1 004.0 3 000.0 300.0 15.0 504.0 307.0
条目数 290 621 227 747 292 359 268 197 290 589 292 462

Table 4

Statistical table of meteorological dataset"

项目 温度/℃ 气压/
hPa
湿度/
%
风向/
(°)
风速/
(m·s-1)
平均 38.2 1 026.8 37.1 35 487.5 9.8
标准差 5 030.6 5 025.7 18.9 184 454.8 5.5
最小 -21.3 940.0 5.0 0.0 0.1
25% 2.5 994.2 23.0 78.0 5.6
50% 13.8 1 005.6 33.0 48.0 8.5
75% 23.2 1 016.9 48.0 280.0 12.9
最大 999 999.0 999 999.0 100.0 999 999.0 30.0
条目数 15 8047 15 8047 15 8047 157 813 157 813

Fig.9

Distribution of meteorological stations andmonitoring stations"

Fig.10

Scatter plot of actual values and prediction values"

Table 5

Comparison among models"

方法模型 S M A
XGBoost 0.430 7 33.094 8 27.054 5
GBDT 0.432 9 33.306 0 27.263 8
本文方法 0.422 9 32.871 1 26.436 0
DNN 0.540 6 42.515 2 33.436 5
LightGBM(无预报) 0.429 8 33.892 3 26.682 5
1 HUANG J , DUAN N , JI P , et al. A crowd source-based sensing system for monitoring fine-grained air quality in urban environments[J]. IEEE Internet of Things Journal, 2018, 6 (2): 3240- 3247.
2 LI X , PENG L , HU Y , et al. Deep learning architecture for air quality predictions[J]. Environmental Science and Pollution Research, 2016, 23 (22): 22408- 22417.
doi: 10.1007/s11356-016-7812-9
3 ZHOU Q , JIANG H , WANG J , et al. A hybrid model for PM2.5 forecasting based on ensemble empirical mode decomposition and a general regression neural network[J]. Science of the Total Environment, 2014, 496, 264- 274.
doi: 10.1016/j.scitotenv.2014.07.051
4 HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780.
doi: 10.1162/neco.1997.9.8.1735
5 COSMA A C , SIMHA R . Machine learning method for real-time non-invasive prediction of individual thermal preference in transient conditions[J]. Building and Environment, 2019, 148, 372- 383.
doi: 10.1016/j.buildenv.2018.11.017
6 ZHU D , CAI C , YANG T , et al. A machine learning approach for air quality prediction: model regularization and optimization[J]. Big Data and Cognitive Computing, 2018, 2 (1): 1- 15.
7 WANG D , WEI S , LUO H , et al. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine[J]. Science of the Total Environment, 2017, 580, 719- 733.
doi: 10.1016/j.scitotenv.2016.12.018
8 MAHAJAN S , LIU H M , TSAI T C , et al. Improving the accuracy and efficiency of PM2.5 forecast service using cluster-based hybrid neural network model[J]. IEEE Access, 2018, 6, 19193- 19204.
doi: 10.1109/ACCESS.2018.2820164
9 ZHENG Y, YI X, LI M, et al. Forecasting fine-grained air quality based on big data[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, Australia: Associ-ation for Computing Machinery, 2015: 2267-2276.
10 ZHANG C, YUAN D. Fast fine-grained air quality index level prediction using random forest algorithm on cluster computing of spark[C]//Proceeding of 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). Beijing, China: IEEE, 2015: 929-934.
11 GAO M , YIN L , NING J . Artificial neural network model for ozone concentration estimation and Monte Carlo analysis[J]. Atmospheric Environment, 2018, 184, 129- 139.
doi: 10.1016/j.atmosenv.2018.03.027
12 ZHENG Y, LIU F, HSIEH HP. U-air: when urban air quality inference meets big data[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA: Association for Computing Machinery, 2013: 1436-1444.
13 HSIEH H P, LIN S D, ZHENG Y. Inferring air quality for station location recommendation based on urban big data[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, Australia: Association for Computing Machinery, 2015: 437-446.
14 WANG J , SONG G . A deep spatial-temporal ensemble model for air quality prediction[J]. Neurocomputing, 2018, 314, 198- 206.
doi: 10.1016/j.neucom.2018.06.049
15 HUANG C J , KUO P H . A deep cnn-lstm model for particulate matter (PM2.5) forecasting in smart cities[J]. Sensors, 2018, 18 (7): 1- 22.
16 SUN W, DUAN N, JI P, et al. Intelligent in-vehicle air quality management: a smart mobility application dealing with air pollution in the traffic[C]//Proceeding of 23rd ITS World Congress. Melbourne, Australia: Intelligent Transport Systems Australia, 2016: 1-12.
17 MA C, DUAN N, SUN W, et al. Reducing air pollution exposure in a road trip[C]//Proceeding of 24rd ITS World Congress. Montreal, Canada: Intelligent Transport Systems Australia, 2017: 1-12.
18 CHENG Y , ZHANG S , HUAN C , et al. Optimization on fresh outdoor air ratio of air conditioning system with stratum ventilation for both targeted indoor air quality and maximal energy saving[J]. Building and Environment, 2019, 147, 11- 22.
doi: 10.1016/j.buildenv.2018.10.009
19 SUN W, ZHU J, DUAN N, et al. Moving object map analytics: a framework enabling contextual spatial-temporal analytics of Internet of Things applications[C]//Proceeding of 2016 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI). Beijing, China: IEEE, 2016: 101-106.
20 ROY S S, PRATYUSH C, BARNA C. Predicting ozone layer concentration using multivariate adaptive regression splines, random forest and classification and regression tree[C]//Proceeding of International Workshop Soft Computing Applications. Arad, Romania: Springer, 2016: 140-152.
21 CHANG J C , HANNA S R . Air quality model performance evaluation[J]. Meteorology and Atmospheric Physics, 2004, 87 (1/2/3): 167- 196.
22 MEIJERING E . A chronology of interpolation: from ancient astronomy to modern signal and image processing[J]. Proceedings of the IEEE, 2002, 90 (3): 319- 342.
23 KE G, MENG Q, FINLEY T, et al. Lightgbm: a highly efficient gradient boosting decision tree[C]//Proceeding of 31st Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates, Inc., 2017: 3146-3154.
24 FRIEDMAN JH . Greedy function approximation: a gradient boosting machine[J]. Annals of Statistics, 2001, 29 (5): 1189- 1232.
[1] Dapeng ZHANG,Yajun LIU,Wei ZHANG,Fen SHEN,Jiansheng YANG. Fake comment detection based on heterogeneous ensemble learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 1-9.
[2] Yutian LIU, Runjia SUN, Hongtao WANG, Xueping GU. Review on application of artificial intelligence in power system restoration [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 1-8.
[3] Tong LI,Ran MA,Honghe ZHENG,Ping AN,Xiangyu HU. An error sensitivity model based on video statistical features [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 116-121.
[4] Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[5] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[6] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[7] WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.
[8] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[9] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[10] LIU Jie, YANG Peng, LYU Wensheng, LIU Agudamu, LIU Junxiu. Prediction models of PM2.5 mass concentration based on meteorological factors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 76-83.
[11] ZHENG Yi, ZHU Chengzhang. A prediction method of atmospheric PM2.5 based on DBNs [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 19-25.
[12] XIE Lin1, YIN Xi-yao2, LI Fan-zhang3, WU Jia3. A kind of inverse resolution learning expression [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(4): 46-50.
[13] HE Xue-ying1, 2, QIN Wei1, YIN Yi-long1 *, ZHAO Lian-zheng1,QIAO Hao3. Video-based fingerprint verification using machine learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 29-33.
[14] LIANG Chun-lin1, PENG Ling-xi2*. An immune network based unsupervised classifier [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 82-86.
[15] GUO Mao-Zu, ZOU Quan, LI Wen-Bin, HAN Ying-Peng. Learning in bioinformatics [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Liang, LUO Qiming, CHEN Enhong. Graph-based ranking model for object-level search
[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 15 -21 .
[2] LIU Wen-liang, ZHU Wei-hong, CHEN Di, ZHANG Hong-quan. Detection and tracking of moving targets using the morphology match in radar images[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 31 -36 .
[3] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[4] MENG Jian, LI Yibin, LI Bin. Bound gait controlling method of quadruped robot[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 28 -34 .
[5] JIAO Ji-cheng,GAO Xue-dong,WANG Yuan-pu,ZHAO Chuan-ling . Study of the attribute union theory and attribute reduction algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(2): 112 -116 .
[6] LI Meng-li, WANG Wei-qiang ,XU Shu-gen , SONG Ming-da. Possibility analysis on chemical explosion of material causing urea  reactor cylinder fracture[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 1 -6 .
[7] RUAN Jiu-Hong, LI Yi-Bin, YANG Fu-An, RONG Hua-Wen. On manned AWID-AWIS vehicle dynamics control[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(1): 10 -14 .
[8] GUO Hong,HE Xiao-ying . Effects of H2SO3 on the corrosion behavior of X70 steel in weak acid solutions[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 118 -122 .
[9] JIN Yao-yao,TIAN Mao-cheng,ZHOU Shou-jun, . Online monitoring and diagnostic system for boiler combustion[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 21 -25 .
[10] LIU Wen-jiang1,2, SUI Qing-mei1, ZHOU Feng-yu1. Straight-line tracking control of ships based on ADRC[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(6): 48 -53 .