A robust approach to find effective items in distributed data streams

Authors:
Xiaoxia Rong;Jindong Wang
Affiliations:
School of Mathematics and System Sciences, Shandong University, Jinan, China;Shandong Computer Science Center, Jinan, China
Venue:
LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Year:
2007

Citing 9
Cited 0

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Distributed streams algorithms for sliding windows

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
Synopsis diffusion for robust aggregation in sensor networks

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Data items that appear frequently in data streams are called frequent data items, which often play a more important role than others in data streams management system. So how to identifying frequent items is one of key technologies. As distributed data streams management system is concerned, there are many input data streams having different effect on result, the pure character of frequency is unfit for finding the important data. To solve this problem, effective data of distributed data streams is defined in this paper, which combines the frequency of items and the weight of streams. Based on an optimization technique that is devised to minimize main memory usage, a robust mining approach is proposed. According to this algorithm, the effective data can be output with limited space cost. At the same time, the sensitivity of algorithm is analyzed which shows the output result is within the error given by the user. Finally a series of experiments show the efficiency of the mining algorithm.