Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach

Authors:
Kuen-Fang Jea;Chao-Wei Li
Affiliations:
Department of Computer Science and Engineering, National Chung-Hsing University, 250 Kuo-Kuan Road, Taichung 40227, Taiwan, ROC;Department of Computer Science and Engineering, National Chung-Hsing University, 250 Kuo-Kuan Road, Taichung 40227, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 9
Cited 2

Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient and flexible algorithm for online mining of large itemsets

Information Processing Letters
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
An on-line interactive method for finding association rules data streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
estMax: Tracing Maximal Frequent Itemsets over Online Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Incremental updates of closed frequent itemsets over continuous data streams

Expert Systems with Applications: An International Journal
A false negative approach to mining frequent itemsets from high speed transactional data streams

Information Sciences: an International Journal

An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams

Expert Systems with Applications: An International Journal
Mining frequent patterns from dynamic data streams with data load management

Journal of Systems and Software

Quantified Score

Hi-index	12.05

Visualization

Abstract

A data stream is a massive and unbounded sequence of data elements that are continuously generated at a fast speed. Compared with traditional approaches, data mining in data streams is more challenging since several extra requirements need to be satisfied. In this paper, we propose a mining algorithm for finding frequent itemsets over the transactional data stream. Unlike most of existing algorithms, our method works based on the theory of Approximate Inclusion-Exclusion. Without incrementally maintaining the overall synopsis of the stream, we can approximate the itemsets' counts according to certain kept information and the counts bounding technique. Some additional techniques are designed and integrated into the algorithm for performance improvement. Besides, the performance of the proposed algorithm is tested and analyzed through a series of experiments.