A robust approach to find effective items in distributed data streams

  • Authors:
  • Xiaoxia Rong;Jindong Wang

  • Affiliations:
  • School of Mathematics and System Sciences, Shandong University, Jinan, China;Shandong Computer Science Center, Jinan, China

  • Venue:
  • LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Data items that appear frequently in data streams are called frequent data items, which often play a more important role than others in data streams management system. So how to identifying frequent items is one of key technologies. As distributed data streams management system is concerned, there are many input data streams having different effect on result, the pure character of frequency is unfit for finding the important data. To solve this problem, effective data of distributed data streams is defined in this paper, which combines the frequency of items and the weight of streams. Based on an optimization technique that is devised to minimize main memory usage, a robust mining approach is proposed. According to this algorithm, the effective data can be output with limited space cost. At the same time, the sensitivity of algorithm is analyzed which shows the output result is within the error given by the user. Finally a series of experiments show the efficiency of the mining algorithm.