Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering

Authors:
Vladimir Estivill-Castro;Michael E. Houle
Affiliations:
-;-
Venue:
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2001

Citing 12
Cited 0

The use of non-hierarchical allocation methods for clustering large sets of data

Australian Computer Journal
Robust regression and outlier detection

Robust regression and outlier detection
The design and analysis of spatial data structures

The design and analysis of spatial data structures
Graph drawing by force-directed placement

Software—Practice & Experience
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Analysis of aggregation errors for the p-median problem

Computers and Operations Research - Special issue on aggregation and disaggregation in operations research
The demand partitioning method for reducing aggregation errors in p-median problems

Computers and Operations Research - Special issue on aggregation and disaggregation in operations research
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
The BANG-Clustering System: Grid-Based Data Analysis

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast and robust general purpose clustering algorithms

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical principles suggest minimization of the total within-group distance (TWGD) as a robust criterion for clustering point data associated with a Geographical Information System [17]. This NP-hard problem must essentially be solved using heuristic methods, although admitting a linear programming formulation. Heuristics proposed so far require quadratic time, which is prohibitively expensive for data mining applications. This paper introduces data structures for the management of large bi-dimensional point data sets and for fast clustering via interchange heuristics. These structures avoid the need for quadratic time through approximations to proximity information. Our scheme is illustrated with two-dimensional quadtrees, but can be extended to use other structures suited to three dimensional data or spatial data with time-stamps. As a result, we obtain a fast and robust clustering method.