Error-adaptive and time-aware maintenance of frequency counts over data streams

Authors:
Hongyan Liu;Ying Lu;Jiawei Han;Jun He
Affiliations:
Tsinghua University, China;University of Illinois, Urbana, Champaign;University of Illinois, Urbana, Champaign;Renmin University of China, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 7
Cited 1

Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

Mining frequent items in data stream using time fading model

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maintaining frequency counts for items over data stream has a wide range of applications such as web advertisement fraud detection. Study of this problem has attracted great attention from both researchers and practitioners. Many algorithms have been proposed. In this paper, we propose a new method, error-adaptive pruning method, to maintain frequency more accurately. We also propose a method called fractionization to record time information together with the frequency information. Using these two methods, we design three algorithms for finding frequent items and top-k frequent items. Experimental results show these methods are effective in terms of improving the maintenance accuracy.