The use of non-hierarchical allocation methods for clustering large sets of data
Australian Computer Journal
Robust regression and outlier detection
Robust regression and outlier detection
The design and analysis of spatial data structures
The design and analysis of spatial data structures
Graph drawing by force-directed placement
Software—Practice & Experience
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Analysis of aggregation errors for the p-median problem
Computers and Operations Research - Special issue on aggregation and disaggregation in operations research
The demand partitioning method for reducing aggregation errors in p-median problems
Computers and Operations Research - Special issue on aggregation and disaggregation in operations research
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
The BANG-Clustering System: Grid-Based Data Analysis
IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast and robust general purpose clustering algorithms
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Hi-index | 0.00 |
Statistical principles suggest minimization of the total within-group distance (TWGD) as a robust criterion for clustering point data associated with a Geographical Information System [17]. This NP-hard problem must essentially be solved using heuristic methods, although admitting a linear programming formulation. Heuristics proposed so far require quadratic time, which is prohibitively expensive for data mining applications. This paper introduces data structures for the management of large bi-dimensional point data sets and for fast clustering via interchange heuristics. These structures avoid the need for quadratic time through approximations to proximity information. Our scheme is illustrated with two-dimensional quadtrees, but can be extended to use other structures suited to three dimensional data or spatial data with time-stamps. As a result, we obtain a fast and robust clustering method.