Regional Pattern Discovery in Geo-referenced Datasets Using PCA

  • Authors:
  • Oner Ulvi Celepcikay;Christoph F. Eick;Carlos Ordonez

  • Affiliations:
  • Department of Computer Science, University of Houston, Houston 77204-3010;Department of Computer Science, University of Houston, Houston 77204-3010;Department of Computer Science, University of Houston, Houston 77204-3010

  • Venue:
  • MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing data mining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional; therefore there is a great need to extract regional knowledge from spatial datasets. This paper proposes a novel framework to discover interesting regions characterized by "strong regional correlation relationships" between attributes, and methods to analyze differences and similarities between regions. The framework employs a two-phase approach: it first discovers regions by employing clustering algorithms that maximize a PCA-based fitness function and then applies post processing techniques to explain underlying regional structures and correlation patterns. Additionally, a new similarity measure that assesses the structural similarity of regions based on correlation sets is introduced. We evaluate our framework in a case study which centers on finding correlations between arsenic pollution and other factors in water wells and demonstrate that our framework effectively identifies regional correlation patterns.