Dynamic incremental data summarization for hierarchical clustering

  • Authors:
  • Bing Liu;Yuliang Shi;Zhihui Wang;Wei Wang;Baile Shi

  • Affiliations:
  • Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China;Department of Computing and Information Technology, Fudan University, Shanghai, China

  • Venue:
  • WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many real world applications, with the databases frequent insertions and deletions, the ability of a data mining technique to detect and react quickly to dynamic changes in the data distribution and clustering over time is highly desired. Data summarizations (e.g., data bubbles) have been proposed to compress large databases into representative points suitable for subsequent hierarchical cluster analysis. In this paper, we thoroughly investigate the quality measure (data summarization index) of incremental data bubbles. When updating databases, we show which factors could affect the mean and standard deviation of data summarization index or not. Based on these statements, a fully dynamic scheme to maintain data bubbles incrementally is proposed. An extensive experimental evaluation confirms our statements and shows that the fully dynamic incremental data bubbles are effective in preserving the quality of the data summarization for hierarchical clustering.