TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Authors:
Bei Yang;Houkuan Huang
Affiliations:
School of Information Engineering, Zhengzhou University, Zhengzhou, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Venue:
Knowledge and Information Systems
Year:
2010

Citing 20
Cited 2

Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining N-most Interesting Itemsets

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints

IEEE Transactions on Knowledge and Data Engineering
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
TSP: Mining top-k closed sequential patterns

Knowledge and Information Systems
CFI-Stream: mining closed frequent itemsets in data streams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-K frequent itemsets from data streams

Data Mining and Knowledge Discovery
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems
Finding recently frequent itemsets adaptively over online transactional data streams

Information Systems
CanTree: a canonical-order tree for incremental frequent-pattern mining

Knowledge and Information Systems
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Online mining of frequent sets in data streams with error guarantee

Knowledge and Information Systems
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

MHUI-max: An efficient algorithm for discovering high-utility itemsets from data streams

Journal of Information Science
Learning from concept drifting data streams with unlabeled data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. Itwill bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting ining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.