Regional Pattern Discovery in Geo-referenced Datasets Using PCA

Authors:
Oner Ulvi Celepcikay;Christoph F. Eick;Carlos Ordonez
Affiliations:
Department of Computer Science, University of Houston, Houston 77204-3010;Department of Computer Science, University of Houston, Houston 77204-3010;Department of Computer Science, University of Houston, Houston 77204-3010
Venue:
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2009

Citing 10
Cited 0

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised probabilistic principal component analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Zonal Co-location Pattern Discovery with Dynamic Parameters

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Finding regional co-location patterns for sets of continuous variables in spatial datasets

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Towards region discovery in spatial datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Discovery of interesting regions in spatial data sets using supervised clustering

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
MOSAIC: a proximity graph approach for agglomerative clustering

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing data mining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional; therefore there is a great need to extract regional knowledge from spatial datasets. This paper proposes a novel framework to discover interesting regions characterized by "strong regional correlation relationships" between attributes, and methods to analyze differences and similarities between regions. The framework employs a two-phase approach: it first discovers regions by employing clustering algorithms that maximize a PCA-based fitness function and then applies post processing techniques to explain underlying regional structures and correlation patterns. Additionally, a new similarity measure that assesses the structural similarity of regions based on correlation sets is introduced. We evaluate our framework in a case study which centers on finding correlations between arsenic pollution and other factors in water wells and demonstrate that our framework effectively identifies regional correlation patterns.