Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification

  • Authors:
  • Dawei Wang;Wei Ding;Kui Yu;Xindong Wu;Ping Chen;David L. Small;Shafiqul Islam

  • Affiliations:
  • University of Massachussets Boston, Boston, MA, USA;University of Massachussets Boston, Boston, MA, USA;Hefei University of Technology, China, Hefei, China;University of Vermont, Burlington, VT, USA;University of Houston-Downtown, Houston, TX, USA;Tufts University, Boston, MA, USA;Tufts University, Boston, MA, USA

  • Venue:
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of disastrous flood forecasting techniques able to provide warnings at a long lead-time (5-15 days) is of great importance to society. Extreme Flood is usually a consequence of a sequence of precipitation events occurring over from several days to several weeks. Though precise short-term forecasting the magnitude and extent of individual precipitation event is still beyond our reach, long-term forecasting of precipitation clusters can be attempted by identifying persistent atmospheric regimes that are conducive for the precipitation clusters. However, such forecasting will suffer from overwhelming number of relevant features and high imbalance of sample sets. In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods. We synthesize a representative feature set that describes the atmosphere motion, and apply a streaming feature selection algorithm to online identify the precipitation precursors from the enormous feature space. A hierarchical re-sampling approach is embedded in the framework to deal with the imbalance problem. An extensive empirical study is conducted on historical precipitation and associated flood data collected in the State of Iowa. Utilizing our framework a few physically meaningful precipitation cluster precursor sets are identified from millions of features. More than 90% of extreme precipitation events are captured by the proposed prediction model using precipitation cluster precursors with a lead time of more than 5 days.