Dynamic incremental data summarization for hierarchical clustering

Authors:
Bing Liu;Yuliang Shi;Zhihui Wang;Wei Wang;Baile Shi
Affiliations:
Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 10
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data bubbles: quality preserving performance boosting for hierarchical clustering

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An Incremental Approach to Building a Cluster Hierarchy

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Incremental and effective data summarization for dynamic hierarchical clustering

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic extraction of clusters from hierarchical clustering representations

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real world applications, with the databases frequent insertions and deletions, the ability of a data mining technique to detect and react quickly to dynamic changes in the data distribution and clustering over time is highly desired. Data summarizations (e.g., data bubbles) have been proposed to compress large databases into representative points suitable for subsequent hierarchical cluster analysis. In this paper, we thoroughly investigate the quality measure (data summarization index) of incremental data bubbles. When updating databases, we show which factors could affect the mean and standard deviation of data summarization index or not. Based on these statements, a fully dynamic scheme to maintain data bubbles incrementally is proposed. An extensive experimental evaluation confirms our statements and shows that the fully dynamic incremental data bubbles are effective in preserving the quality of the data summarization for hierarchical clustering.