An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

Authors:
Ruoming Jin;Gagan Agrawal
Affiliations:
Kent State University;Ohio State University
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 15
Cited 33

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Incremental maintenance of generalized association rules under taxonomy evolution

Journal of Information Science
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Mining frequent items in a stream using flexible windows

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Efficient Mining of Frequent Itemsets from Data Streams

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
Incremental updates of closed frequent itemsets over continuous data streams

Expert Systems with Applications: An International Journal
Mining frequent closed itemsets from a landmark window over online data streams

Computers & Mathematics with Applications
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Interactive mining of top-K frequent closed itemsets from data streams

Expert Systems with Applications: An International Journal
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
Discovery of frequent distributed event patterns in sensor networks

EWSN'08 Proceedings of the 5th European conference on Wireless sensor networks
A new algorithm for mining global frequent itemsets in a stream

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Mining frequent patterns from network flows for monitoring network

Expert Systems with Applications: An International Journal
On dense pattern mining in graph streams

Proceedings of the VLDB Endowment
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
A generic approach for mining indirect association rules in data streams

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Hardware enhanced mining for association rules

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
MFIS—Mining frequent itemsets on data streams

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Mining recent frequent itemsets in data streams by radioactively attenuating strategy

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A sliding window-based false-negative approach for ubiquitous data stream analysis

International Journal of Communication Systems
Incremental itemset mining based on matrix apriori algorithm

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Improved counter based algorithms for frequent pairs mining in transactional data streams

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Size matters: finding the most informative set of window lengths

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Threading machine generated email

Proceedings of the sixth ACM international conference on Web search and data mining
Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining
Stream mining of frequent sets with limited memory

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Real time processing of data from patient biodevices

HIKM '11 Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120
Mining frequent itemsets from sparse data streams in limited memory environments

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Mining frequent itemsets in a stream

Information Systems
Fast mining Top-Rank-k frequent patterns by using Node-lists

Expert Systems with Applications: An International Journal
Efficient frequent itemset mining methods over time-sensitive streams

Knowledge-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Frequent itemset mining is a core data mining operation and has been extensively studied over the last decade. This paper takes a new approach for this problem and makes two major contributions. First, we present a one pass algorithm for frequent itemset mining, which has deterministic bounds on the accuracy, and does not require any out-of-core summary structure. Second, because our one pass algorithm does not produce any false negatives, it can be easily extended to a two pass accurate algorithm. Our two pass algorithm is very memory efficient, and allows mining of datasets with large number of distinct items and/or very low support levels. Our detailed experimental evaluation on synthetic and real datasets shows the following. First, our one pass algorithm is very accurate in practice. Second, our algorithm requires significantly lower memory than Manku and Motwani's one pass algorithm and the multi-pass Apriori algorithm. Our two pass algorithm outperforms Apriori and FP-tree when the number of distinct items is large and/or support levels are very low. In other cases, it is quite competitive, with possible exception of cases where the average length of frequent itemsets is quite high.