Maintaining frequent itemsets over high-speed data streams

Authors:
James Cheng;Yiping Ke;Wilfred Ng
Affiliations:
Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China;Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China;Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 3
Cited 12

estWin: adaptively monitoring the recent change of frequent itemsets over online data streams

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Efficient Approximate Mining of Frequent Patterns over Transactional Data Streams

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
A comparison between approximate counting and sampling methods for frequent pattern mining on data streams

Intelligent Data Analysis
Mining informative rule set for prediction over a sliding window

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
Search method of time sensitive frequent itemsets in data streams

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Using a real-time top-k algorithm to mine the most frequent items over multiple streams

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a false-negative approach to approximate the set of frequent itemsets (FIs) over a sliding window. Existing approximate algorithms use an error parameter, ε, to control the accuracy of the mining result. However, the use of ε leads to a dilemma. A smaller ε gives a more accurate mining result but higher computational complexity, while increasing ε degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.