Efficiently Discovering Recent Frequent Items in Data Streams

Authors:
Ferry Irawan Tantono;Nishad Manerikar;Themis Palpanas
Affiliations:
University of Trento,;University of Trento,;University of Trento,
Venue:
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Year:
2008

Citing 18
Cited 5

Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Lots o'Ticks: real time high performance time series queries on billions of trades and quotes

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Applications of Data Mining to Electronic Commerce

Data Mining and Knowledge Discovery
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Sliding window filtering: an efficient method for incremental mining on a time-variant database.

Information Systems
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Enhancing SWF for incremental association mining by itemset maintenance

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

From web data to entities and back

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Efficient term cloud generation for streaming web content

ICWE'10 Proceedings of the 10th international conference on Web engineering
A knowledge mining framework for business analysts

ACM SIGMIS Database
dbTrento: the data and information management group at the University of Trento

ACM SIGMOD Record
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of frequent item discovery in streaming data has attracted a lot of attention lately. While the above problem has been studied extensively, and several techniques have been proposed for its solution, these approaches treat all the values of the data stream equally. Nevertheless, not all values are of equal importance. In several situations, we are interested more in the new values that have appeared in the stream, rather than in the older ones.In this paper, we address the problem of finding recentfrequent items in a data stream given a small bounded memory, and present novel algorithms to this direction. We propose a basic algorithm that extends the functionality of existing approaches by monitoring item frequencies in recent windows. Subsequently, we present an improved version of the algorithm with significantly improved performance (in terms of accuracy), at no extra memory cost. Finally, we perform an extensive experimental evaluation, and show that the proposed algorithms can efficiently identify the frequent items in ad hoc recent windows of a data stream.