Mining frequent itemsets in a stream

Authors:
Toon Calders;Nele Dexters;Joris J. M. Gillis;Bart Goethals
Affiliations:
Eindhoven University of Technology, The Netherlands;University of Antwerp, Belgium;Hasselt University, Agoralaan Gebouw D, 3590 Diepenbeek, Belgium;University of Antwerp, Belgium
Venue:
Information Systems
Year:
2014

Citing 20
Cited 1

Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Mining frequent items in a stream using flexible windows

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Online mining of frequent sets in data streams with error guarantee

Knowledge and Information Systems
Mining Frequent Itemsets in a Stream

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Maintaining frequent closed itemsets over a sliding window

Journal of Intelligent Information Systems
Mining frequent itemsets over data streams using efficient window sliding techniques

Expert Systems with Applications: An International Journal
Verifying and Mining Frequent Patterns from Large Windows over Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
FIA: Frequent Itemsets Mining Based on Approximate Counting in Data Streams

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
Methods for finding frequent items in data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
Mining frequent closed graphs on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it has many useful applications; e.g., opinion and sentiment analysis from social networks. Current stream mining algorithms are based on approximations. In earlier work, mining frequent items in a stream under the max-frequency measure proved to be effective for items. In this paper, we extended our work from items to itemsets. Firstly, an optimized incremental algorithm for mining frequent itemsets in a stream is presented. The algorithm maintains a very compact summary of the stream for selected itemsets. Secondly, we show that further compacting the summary is non-trivial. Thirdly, we establish a connection between the size of a summary and results from number theory. Fourthly, we report results of extensive experimentation, both of synthetic and real-world datasets, showing the efficiency of the algorithm both in terms of time and space.