PM2.5 concentration prediction using hidden semi-Markov model-based times series data mining

  • Authors:
  • Ming Dong;Dong Yang;Yan Kuang;David He;Serap Erdal;Donna Kenski

  • Affiliations:
  • Department of Industrial Engineering and Management, School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dong-chuan Road, Shanghai 200240, PR China;Department of Industrial Engineering and Management, School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dong-chuan Road, Shanghai 200240, PR China;General Electric (Shanghai) Corporation, 1800 Cai Lun Road, Shanghai 201203, PR China;Department of Mechanical and Industrial Engineering, 842 West Taylor Street, University of Illinois-Chicago, Chicago, IL 60607, USA;Environmental and Occupational Health Sciences, School of Public Health, University of Illinois-Chicago, Chicago, IL 60612, USA;Lake Michigan Air Directors Consortium, 2250 E. Devon Ave., Suite 250, Des Plaines, IL 60018, USA

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

In this paper, a novel framework and methodology based on hidden semi-Markov models (HSMMs) for high PM"2"."5 concentration value prediction is presented. Due to lack of explicit time structure and its short-term memory of past history, a standard hidden Markov model (HMM) has limited power in modeling the temporal structures of the prediction problems. To overcome the limitations of HMMs in prediction, we develop the HSMMs by adding the temporal structures into the HMMs and use them to predict the concentration levels of PM"2"."5. As a model-driven statistical learning method, HSMM assumes that both data and a mathematical model are available. In contrast to other data-driven statistical prediction models such as neural networks, a mathematical functional mapping between the parameters and the selected input variables can be established in HSMMs. In the proposed framework, states of HSMMs are used to represent the PM"2"."5 concentration levels. The model parameters are estimated through modified forward-backward training algorithm. The re-estimation formulae for model parameters are derived. The trained HSMMs can be used to predict high PM"2"."5 concentration levels. The validation of the proposed framework and methodology is carried out in real world applications: prediction of high PM"2"."5 concentrations at O'Hare airport in Chicago. The results show that the HSMMs provide accurate predictions of high PM"2"."5 concentration levels for the next 24h.