DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Authors:
Hua-Fu Li;Man-Kwan Shan;Suh-Yin Lee
Affiliations:
Kainan University, Department of Computer Science, Taoyuan, Taiwan;National Chengchi University, Department of Computer Science, Taipei, Taiwan;National Chiao-Tung University, Department of Computer Science, Hsinchu, Taiwan
Venue:
Knowledge and Information Systems
Year:
2008

Citing 14
Cited 11

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Issues in data stream management

ACM SIGMOD Record
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining top-K frequent itemsets from data streams

Data Mining and Knowledge Discovery
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Online mining of frequent sets in data streams with error guarantee

Knowledge and Information Systems

Interactive mining of top-K frequent closed itemsets from data streams

Expert Systems with Applications: An International Journal
Inclusion problems in trace monoids

Cybernetics and Systems Analysis
TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Knowledge and Information Systems
Mining informative rule set for prediction over a sliding window

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
MHUI-max: An efficient algorithm for discovering high-utility itemsets from data streams

Journal of Information Science
Interactive stream mining of maximal frequent itemsets allowing flexible time intervals and support thresholds

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Interactive mining of high utility patterns over data streams

Expert Systems with Applications: An International Journal
Mining frequent patterns in a varying-size sliding window of online transactional data streams

Information Sciences: an International Journal
Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model

Computers & Mathematics with Applications
Extrapolation prefix tree for data stream mining using a landmark model

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Mining top-k frequent patterns over data streams sliding window

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.