False-Negative frequent items mining from data streams with bursting

Authors:
Zhihong Chong;Jeffrey Xu Yu;Hongjun Lu;Zhengjie Zhang;Aoying Zhou
Affiliations:
Fudan University, China;Chinese University of Hong Kong, China;Hong Kong University of Science and Technology, China;Fudan University, China;Fudan University, China
Venue:
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Year:
2005

Citing 6
Cited 0

Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, s. It controls the possibility of missing frequent items using a reliability parameter δ. The importance of false-negative frequent items mining is that it can exclude false-positives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.