Online mining of frequent sets in data streams with error guarantee

Authors:
Xuan Hong Dang;Wee-Keong Ng;Kok-Leong Ong
Affiliations:
Nanyang Technological University, School of Computer Engineering, Nanyang Avenue, 639798, Singapore, Singapore;Nanyang Technological University, School of Computer Engineering, Nanyang Avenue, 639798, Singapore, Singapore;Deakin University, School of Engineering and IT, Nanyang Avenue, 3217, Waurn Ponds, VIC, Australia
Venue:
Knowledge and Information Systems
Year:
2008

Citing 21
Cited 7

Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining a stream of transactions for customer patterns

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
estWin: adaptively monitoring the recent change of frequent itemsets over online data streams

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Using association rules for fraud detection in web advertising networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Window-aware load shedding for aggregation queries over data streams

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive load shedding for mining frequent patterns from data streams

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery

Data Streaming with Affinity Propagation

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Knowledge and Information Systems
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
Interactive mining of high utility patterns over data streams

Expert Systems with Applications: An International Journal
Mining frequent itemsets in a stream

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

For most data stream applications, the volume of data is too huge to be stored in permanent devices or to be thoroughly scanned more than once. It is hence recognized that approximate answers are usually sufficient, where a good approximation obtained in a timely manner is often better than the exact answer that is delayed beyond the window of opportunity. Unfortunately, this is not the case for mining frequent patterns over data streams where algorithms capable of online processing data streams do not conform strictly to a precise error guarantee. Since the quality of approximate answers is as important as their timely delivery, it is necessary to design algorithms to meet both criteria at the same time. In this paper, we propose an algorithm that allows online processing of streaming data and yet guaranteeing the support error of frequent patterns strictly within a user-specified threshold. Our theoretical and experimental studies show that our algorithm is an effective and reliable method for finding frequent sets in data stream environments when both constraints need to be satisfied.