BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A volumetric method for building complex models from range images
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Handbook of discrete and computational geometry
The crust and the &Bgr;-Skeleton: combinatorial curve reconstruction
Graphical Models and Image Processing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Clustering in large graphs and matrices
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms for projective clustering
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Curve reconstruction: connecting dots with good reason
Computational Geometry: Theory and Applications
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Projective clustering in high dimensions using core-sets
Proceedings of the eighteenth annual symposium on Computational geometry
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
On clusterings-good, bad and spectral
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Faster core-set constructions and data stream algorithms in fixed dimensions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Approximating extent measures of points
Journal of the ACM (JACM)
Adaptive sampling for geometric problems over data streams
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Coresets in dynamic geometric data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Smaller coresets for k-median and k-means clustering
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
A divide-and-merge methodology for clustering
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On k-Median clustering in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Robust information-theoretic clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Source selection for image retrieval in peer-to-peer networks
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
We consider the following problem: given an on-line, possibly unbounded stream of two-dimensional (2D) points, how can we summarize its spatial distribution or shape using a small, bounded amount of memory? We propose a novel scheme, called ClusterHull, which represents the shape of the stream as a dynamic collection of convex hulls, with a total of at most m vertices, where m is the size of the memory. The algorithm dynamically adjusts both the number of hulls and the number of vertices in each hull to best represent the stream using its fixed-memory budget. This algorithm addresses a problem whose importance is increasingly recognized, namely, the problem of summarizing real-time data streams to enable on-line analytical processing. As a motivating example, consider habitat monitoring using wireless sensor networks. The sensors produce a steady stream of geographic data, namely, the locations of objects being tracked. In order to conserve their limited resources (power, bandwidth, and storage), the sensors can compute, store, and exchange ClusterHull summaries of their data, without losing important geometric information. We are not aware of other schemes specifically designed for capturing shape information in geometric data streams and so we compare ClusterHull with some of the best general-purpose clustering schemes, such as CURE, k-medians, and LSEARCH. We show through experiments that ClusterHull is able to represent the shape of two-dimensional data streams more faithfully and flexibly than the stream versions of these clustering algorithms.