Density-based hierarchical clustering for streaming data

Authors:
Q. Tu;J. F. Lu;B. Yuan;J. B. Tang;J. Y. Yang
Affiliations:
School of Computer Science, Nanjing University of Science & Technology, Nanjing, China;School of Computer Science, Nanjing University of Science & Technology, Nanjing, China;Intelligent Computing Lab, Division of Informatics, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China;China Telecom Jiangsu Corp., Nanjing, China;School of Computer Science, Nanjing University of Science & Technology, Nanjing, China
Venue:
Pattern Recognition Letters
Year:
2012

Citing 10
Cited 0

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Iterative shrinking method for clustering problems

Pattern Recognition
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hierarchical initialization approach for K-Means clustering

Pattern Recognition Letters
Minimum spanning tree based split-and-merge: A hierarchical clustering method

Information Sciences: an International Journal
Fast and memory efficient implementation of the exact PNN

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.10

Visualization

Abstract

For streaming data that arrive continuously such as multimedia data and financial transactions, clustering algorithms are typically allowed to scan the data set only once. Existing research in this domain mainly focuses on improving the accuracy of clustering. In this paper, a novel density-based hierarchical clustering scheme for streaming data is proposed in order to improve both accuracy and effectiveness; it is based on the agglomerative clustering framework. Traditionally, clustering algorithms for streaming data often use the cluster center to represent the whole cluster when conducting cluster merging, which may lead to unsatisfactory results. We argue that even if the data set is accessed only once, some parameters, such as the variance within cluster, the intra-cluster density and the inter-cluster distance, can be calculated accurately. This may bring measurable benefits to the process of cluster merging. Furthermore, we employ a general framework that can incorporate different criteria and, given the same criteria, will produce similar clustering results for both streaming and non-streaming data. In experimental studies, the proposed method demonstrates promising results with reduced time and space complexity.