Mining frequent itemsets in time-varying data streams

Authors:
Yingying Tao;M. Tamer Özsu
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 6
Cited 2

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Mining data streams: a review

ACM SIGMOD Record
Research issues in data stream association rule mining

ACM SIGMOD Record
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Mining frequent patterns in a varying-size sliding window of online transactional data streams

Information Sciences: an International Journal
Discovering generalized association rules from Twitter

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent itemsets in data streams is beneficial to many real-world applications but is also a challenging task since data streams are unbounded and have high arrival rates. Moreover, the distribution of data streams can change over time, which makes the task of maintaining frequent itemsets even harder. In this paper, we propose a false-negative oriented algorithm, called TWIM, that can find most of the frequent itemsets, detect distribution changes, and update the mining results accordingly. Experimental results show that our algorithm performs as good as other false-negative algorithms on data streams without distribution change, and has the ability to detect changes over time-varying data streams in -time with a high accuracy rate.