Continuously identifying representatives out of massive streams

Authors:
Qiong Li;Xiuli Ma;Shiwei Tang;Shuiyuan Xie
Affiliations:
School of Electronics Engineering and Computer Science, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Venue:
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Year:
2011

Citing 8
Cited 1

Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Streaming pattern discovery in multiple time-series

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Clustering over Multiple Evolving Streams by Events and Correlations

IEEE Transactions on Knowledge and Data Engineering
Approximate Clustering on Distributed Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
DynaMMo: mining and summarization of coevolving sequences with missing values

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast approximate correlation for massive time-series data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining maximal correlated member clusters in high dimensional database

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Data stream clustering: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

More and more emerging applications are involved in monitoring multiple data streams concurrently. In these applications, the data flow out of multiple concurrent sources continuously. In such large-scale real-time monitoring applications, continuously identifying representatives out of massive streams is an important task which aims to capture key trends to support online monitoring and analysis. In this paper, we present a framework for continuously extracting representatives out of massive streams. Our framework identifies and traces representatives based on core clustering technique. We adapt the core clustering model under streaming condition and propose a method of extracting representatives by utilizing the advantage characteristic of core clusters that core set is tight. In order to continuously identify the representatives in an efficient way, we apply online representatives adjust processes only when significant clustering evolution happens. As shown in our experimental studies, our algorithm is effective and efficient.