Using a real-time top-k algorithm to mine the most frequent items over multiple streams

  • Authors:
  • Ling Wang;Zhao Yang Qu;Tie Hua Zhou;Keun Ho Ryu

  • Affiliations:
  • Department of Computer Science and Technology, School of Information Engineering, Northeast Dianli University, Jilin, China;Department of Computer Science and Technology, School of Information Engineering, Northeast Dianli University, Jilin, China;Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea;Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea

  • Venue:
  • ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some applications such as sensor networks, internet traffic analysis, location-based services, and health measurements are always required for considering unbounded, fast, large-volumes, continuous, even for distributed stream data. It's a better way to use synopsis as a list of partial summaries of unknown item sets in order to reduce the memory space usage, let it can afford to process so fast and huge incoming data. Normally, different quantity of item set leads to different summaries, especially for Top-k operator which as a partial preprocess over synopsis. Therefore, we proposed smooth synopsis that dynamically assigns a numeral interval to resolve the items set, in order to maintain a more accurate approximate answers' list from partial Top-k processing. In particular, we proposed an algorithm (called SFI algorithm) to mine the most frequent items by a more adaptive and fast way in specific stream resources. Finally, our experimental results demonstrate the accuracy and efficiency of our approximation techniques.