Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Authors:
Nishad Manerikar;Themis Palpanas
Affiliations:
University of Trento, Via Sommarive 14 Povo, TN 38100, Italy;University of Trento, Via Sommarive 14 Povo, TN 38100, Italy
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 22
Cited 13

Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Synopsis data structures for massive data sets

External memory algorithms
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Finding frequent items in data streams

Proceedings of the VLDB Endowment
Enhancing SWF for incremental association mining by itemset maintenance

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Space-optimal heavy hitters with strong error bounds

ACM Transactions on Database Systems (TODS)
Finding top-k elements in data streams

Information Sciences: an International Journal
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Data & Knowledge Engineering
From web data to entities and back

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Efficient term cloud generation for streaming web content

ICWE'10 Proceedings of the 10th international conference on Web engineering
A practical approach to portscan detection in very high-speed links

PAM'11 Proceedings of the 12th international conference on Passive and active measurement
A knowledge mining framework for business analysts

ACM SIGMIS Database
Suppressing redundancy in wireless sensor network traffic

DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems
Efficient frequent item counting in multi-core hardware

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
dbTrento: the data and information management group at the University of Trento

ACM SIGMOD Record
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
An effective query recommendation approach using semantic strategies for intelligent information retrieval

Expert Systems with Applications: An International Journal
Mining frequent items in data stream using time fading model

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of detecting frequent items in streaming data is relevant to many different applications across many domains. Several algorithms, diverse in nature, have been proposed in the literature for the solution of the above problem. In this paper, we review these algorithms, and we present the results of the first extensive comparative experimental study of the most prominent algorithms in the literature. The algorithms were comprehensively tested using a common test framework on several real and synthetic datasets. Their performance with respect to the different parameters (i.e., parameters intrinsic to the algorithms, and data related parameters) was studied. We report the results, and insights gained through these experiments.