An O(n log n) algorithm for the all-nearest-neighbors problem
Discrete & Computational Geometry
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Multidimensional binary search trees used for associative searching
Communications of the ACM
Clustering spatial data using random walks
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
FAÇADE: a fast and effective approach to the discovery of dense clusters in noisy spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering spatial patterns accurately with effective noise removal
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
DGCL: an efficient density and grid based clustering algorithm for large spatial database
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
PatZip: pattern-preserved spatial data compression
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.01 |
Spatial data mining presents new challenges due to the large size and the high dimensionality of spatial data. A common approach to such challenges is to perform some form of compression on the initial databases and then process the compressed data. This paper presents a novel spatial data compression method, called GraphZip, to produce a compact representation of the original data set. GraphZip has two advantages: first, the spatial pattern of the original data set is preserved in the compressed data. Second, arbitrarily dimensional data can be processed efficiently and automatically. Applying GraphZip to huge databases can enhance both the effectiveness and the efficiency of spatial data clustering. On one hand, performing a clustering algorithm on the compressed data set requires less running time while the pattern can still be discovered. On the other hand, the complexity of clustering is dramatically reduced. A general hierarchical clustering method using GraphZip is proposed in this paper. The experimental studies on four benchmark spatial data sets produce very encouraging results.