GraphZip: a fast and automatic compression method for spatial data clustering

Authors:
Yu Qian;Kang Zhang
Affiliations:
The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 12
Cited 4

An O(n log n) algorithm for the all-nearest-neighbors problem

Discrete & Computational Geometry
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Multidimensional binary search trees used for associative searching

Communications of the ACM
Clustering spatial data using random walks

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

FAÇADE: a fast and effective approach to the discovery of dense clusters in noisy spatial data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering spatial patterns accurately with effective noise removal

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
DGCL: an efficient density and grid based clustering algorithm for large spatial database

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
PatZip: pattern-preserved spatial data compression

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

Spatial data mining presents new challenges due to the large size and the high dimensionality of spatial data. A common approach to such challenges is to perform some form of compression on the initial databases and then process the compressed data. This paper presents a novel spatial data compression method, called GraphZip, to produce a compact representation of the original data set. GraphZip has two advantages: first, the spatial pattern of the original data set is preserved in the compressed data. Second, arbitrarily dimensional data can be processed efficiently and automatically. Applying GraphZip to huge databases can enhance both the effectiveness and the efficiency of spatial data clustering. On one hand, performing a clustering algorithm on the compressed data set requires less running time while the pattern can still be discovered. On the other hand, the complexity of clustering is dramatically reduced. A general hierarchical clustering method using GraphZip is proposed in this paper. The experimental studies on four benchmark spatial data sets produce very encouraging results.