Approximate Frequent Itemset Discovery from Data Stream

Authors:
Anna Ciampi;Fabio Fumarola;Annalisa Appice;Donato Malerba
Affiliations:
Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy 70126;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy 70126;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy 70126;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy 70126
Venue:
AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Year:
2009

Citing 11
Cited 1

Mining data streams under block evolution

ACM SIGKDD Explorations Newsletter
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Mining data streams: a review

ACM SIGMOD Record
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Find Recent Frequent Items with Sliding Windows in Data Streams

IIH-MSP '07 Proceedings of the Third International Conference on International Information Hiding and Multimedia Signal Processing (IIH-MSP 2007) - Volume 02
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Verifying and Mining Frequent Patterns from Large Windows over Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

A landmark-model based system for mining frequent patterns from uncertain data streams

Proceedings of the 15th Symposium on International Database Engineering & Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional algorithms for frequent itemset discovery are designed for static data. They cannot be straightforwardly applied to data streams which are continuous, unbounded, usually coming at high speed and often with a data distribution which changes with time. The main challenges of frequent pattern mining in data streams are: avoiding multiple scans of the entire dataset, optimizing memory usage and capturing distribution drift. To face these challenges, we propose a novel algorithm, which is based on a sliding window model in order to deal with efficiency issues and to keep up with distribution change. Each window consists of several slides. The generation of itemsets is local to each slide, while the estimation of their approximate support is based on the window. Efficiency in the generation of the itemsets is ensured by the usage of a synopsis structure, called SE-tree. Experiments prove the effectiveness of the proposed algorithm.