Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
A guided tour of Chernoff bounds
Information Processing Letters
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
ACM Transactions on Mathematical Software (TOMS)
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Pruning and summarizing the discovered associations
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Scalable Techniques for Mining Causal Structures
Data Mining and Knowledge Discovery
Mining Frequent Item Sets with Convertible Constraints
Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Maintaining time-decaying stream aggregates
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Online Data Mining for Co-Evolving Time Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient data reduction with EASE
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding recent frequent itemsets adaptively over online data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets
Data Mining and Knowledge Discovery
Research issues in data stream association rule mining
ACM SIGMOD Record
Improved Association Rule Mining by Modified Trimming
CIT '06 Proceedings of the Sixth IEEE International Conference on Computer and Information Technology
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Efficient Reservoir Sampling for Transactional Data Streams
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey on algorithms for mining frequent itemsets over data streams
Knowledge and Information Systems
Mining Frequent Itemsets in a Stream
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Maintaining frequent itemsets over high-speed data streams
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
We investigate the problem of finding frequent patterns in a continuous stream of transactions. In the literature, two prominent approaches are often used: (a) perform approximate counting (e.g., lossy counting algorithm (LCA) of Manku and Motwani, VLDB 2002) by using a lower support threshold than the one given by the user, or (b) maintain a running sample (e.g., reservoir sampling (Algo-Z) of Vitter, TOMS 1985) and generate frequent itemsets from the sample on demand. Although both are known to be practically useful, to the best of our knowledge, there has been no comparison between them. In addition, we propose a distance based sampling algorithm (DSS). An empirical comparison study on the algorithms is performed using synthetic and benchmark datasets. Results show that DSS is consistently more accurate than LCA and Algo-Z, whereas LCA performs better than Algo-Z. An outcome of this study is a new algorithm CLCA. In LCA, the proper quantification of the error parameter, ε, is non-trival. CLCA is an attempt to exploit this fact in proposing a new customized LCA algorithm. Interestingly, CLCA outperforms all other algorithms (including DSS) in mining for the frequent itemsets of user's choice.