BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining data streams under block evolution
ACM SIGKDD Explorations Newsletter
Requirements for clustering data streams
ACM SIGKDD Explorations Newsletter
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An Incremental Approach to Building a Cluster Hierarchy
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Elementary Statistics Using Excel, Second Edition
Elementary Statistics Using Excel, Second Edition
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automatic extraction of clusters from hierarchical clustering representations
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Online Hierarchical Clustering in a Data Warehouse Environment
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Exploiting parallelism to support scalable hierarchical clustering
Journal of the American Society for Information Science and Technology
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Tolerance rough set theory based data summarization for clustering large datasets
Transactions on rough sets XIV
Maintaining gaussian mixture models of data streams under block evolution
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Dynamic incremental data summarization for hierarchical clustering
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
On discovering moving clusters in spatio-temporal data
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Towards never-ending learning from time series streams
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Mining informative patterns from very large, dynamically changing databases poses numerous interesting challenges. Data summarizations (e.g., data bubbles) have been proposed to compress very large static databases into representative points suitable for subsequent effective hierarchical cluster analysis. In many real world applications, however, the databases dynamically change due to frequent insertions and deletions, possibly changing the data distribution and clustering structure over time. Completely reapplying both the data summarization and the clustering algorithm to detect the changes in the clustering structure and update the uncovered data patterns following such deletions and insertions is prohibitively expensive for large fast changing databases. In this paper, we propose a new scheme to maintain data bubbles incrementally. By using incremental data bubbles, a high-quality hierarchical clustering is quickly available at any point in time. In our scheme, a quality measure for incremental data bubbles is used to identify data bubbles that do not compress well their underlying data points after certain insertions and deletions. Only these data bubbles are re-built using efficient split and merge operations. An extensive experimental evaluation shows that the incremental data bubbles provide significantly faster data summarization than completely re-building the data bubbles after a certain number of insertions and deletions, and are effective in preserving (and in some cases even improving) the quality of the data summarization.