GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

  • Authors:
  • Rachsuda Jiamthapthaksin;Christoph F. Eick;Seungchan Lee

  • Affiliations:
  • Assumption University, Computer Science Department, Faculty of Science and Technology, Bangkok, Thailand and University of Houston, Computer Science Department, Houston, TX, USA;University of Houston, Computer Science Department, Houston, TX, USA;University of Houston, Computer Science Department, Houston, TX, USA

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Major challenges of clustering geo-referenced data include identifying arbitrarily shaped clusters, properly utilizing spatial information, coping with diverse extrinsic characteristics of clusters and supporting region discovery tasks. The goal of region discovery is to identify interesting regions in geo-referenced datasets based on a domain expert’s notion of interestingness. Almost all agglomerative clustering algorithms only focus on the first challenge. The goal of the proposed work is to develop agglomerative clustering frameworks that deal with all four challenges. In particular, we propose a generic agglomerative clustering framework for geo-referenced datasets (GAC-GEO) generalizing agglomerative clustering by allowing for three plug-in components. GAC-GEO agglomerates neighboring clusters maximizing a plug-in fitness function that capture the notion of interestingness of clusters. It enhances typical agglomerative clustering algorithms in two ways: fitness functions support task-specific clustering, whereas generic neighboring relationships increase the number of merging candidates. We also demonstrate that existing agglomerative clustering algorithms can be considered as specific cases of GAC-GEO. We evaluate the proposed framework on an artificial dataset and two real-world applications involving region discovery. The experimental results show that GAC-GEO is capable of identifying arbitrarily shaped hotspots for different data mining tasks.