A survey on algorithms for mining frequent itemsets over data streams

Authors:
James Cheng;Yiping Ke;Wilfred Ng
Affiliations:
The Hong Kong University of Science and Technology, HKUST, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong;The Hong Kong University of Science and Technology, HKUST, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong;The Hong Kong University of Science and Technology, HKUST, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong
Venue:
Knowledge and Information Systems
Year:
2008

Citing 40
Cited 28

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Sliding-window filtering: an efficient algorithm for incremental mining

Proceedings of the tenth international conference on Information and knowledge management
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management

ACM SIGMOD Record
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
estWin: adaptively monitoring the recent change of frequent itemsets over online data streams

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Range-Efficient Computation of F" over Massive Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining condensed frequent-pattern bases

Knowledge and Information Systems
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On condensed representations of constrained frequent patterns

Knowledge and Information Systems
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems
\delta-Tolerance Closed Frequent Itemsets

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Memory-limited execution of windowed stream joins

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Maintaining frequent itemsets over high-speed data streams

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Mining adaptively frequent closed unlabeled rooted trees in data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Approximate Mining of Frequent Patterns over Transactional Data Streams

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Data & Knowledge Engineering
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Combinatorial optimization in system configuration design

Automation and Remote Control
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
"Same, Same but Different" A Survey on Duplicate Detection Methods for Situation Awareness

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Automatic index construction for multimedia digital libraries

Information Processing and Management: an International Journal
Mining dynamic association rules with comments

Knowledge and Information Systems
Integrating induction and deduction for noisy data mining

Information Sciences: an International Journal
Mining fuzzy association rules from uncertain data

Knowledge and Information Systems
Mining closed itemsets in data stream using formal concept analysis

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
A comparison between approximate counting and sampling methods for frequent pattern mining on data streams

Intelligent Data Analysis
Mining informative rule set for prediction over a sliding window

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Mining frequent closed trees in evolving data streams

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Mining frequent closed graphs on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Behavioural Proximity Discovery: an adaptive approach for root cause analysis

International Journal of Business Intelligence and Data Mining
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Mining of multiobjective non-redundant association rules in data streams

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Interactive mining of high utility patterns over data streams

Expert Systems with Applications: An International Journal
Efficient algorithms for mining maximal high utility itemsets from data streams with different models

Expert Systems with Applications: An International Journal
Direct out-of-memory distributed parallel frequent pattern mining

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
Mining frequent itemsets in a stream

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing prominence of data streams arising in a wide range of advanced applications such as fraud detection and trend learning has led to the study of online mining of frequent itemsets (FIs). Unlike mining static databases, mining data streams poses many new challenges. In addition to the one-scan nature, the unbounded memory requirement and the high data arrival rate of data streams, the combinatorial explosion of itemsets exacerbates the mining task. The high complexity of the FI mining problem hinders the application of the stream mining techniques. We recognize that a critical review of existing techniques is needed in order to design and develop efficient mining algorithms and data structures that are able to match the processing rate of the mining with the high arrival rate of data streams. Within a unifying set of notations and terminologies, we describe in this paper the efforts and main techniques for mining data streams and present a comprehensive survey of a number of the state-of-the-art algorithms on mining frequent itemsets over data streams. We classify the stream-mining techniques into two categories based on the window model that they adopt in order to provide insights into how and why the techniques are useful. Then, we further analyze the algorithms according to whether they are exact or approximate and, for approximate approaches, whether they are false-positive or false-negative. We also discuss various interesting issues, including the merits and limitations in existing research and substantive areas for future research.