Extrapolation prefix tree for data stream mining using a landmark model

Authors:
Yun Sing Koh;Russel Pears;Gillian Dobbie
Affiliations:
Department of Computer Science, University of Auckland, New Zealand;School of Computing and Mathematical Sciences, AUT University, New Zealand;Department of Computer Science, University of Auckland, New Zealand
Venue:
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Year:
2012

Citing 6
Cited 1

Online Mining (Recently) Maximal Frequent Itemsets over Data Streams

RIDE '05 Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications
CanTree: a canonical-order tree for incremental frequent-pattern mining

Knowledge and Information Systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
CP-tree: a tree structure for single-pass frequent pattern mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A false negative approach to mining frequent itemsets from high speed transactional data streams

Information Sciences: an International Journal

Kernel-Tree: mining frequent patterns in a data stream based on forecast support

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the introduction of FP-growth there has been extensive research into extending its usage to data streams or incremental mining. This task is particularly challenging in the data stream environment because of the unbounded nature of a data stream and the need for avoiding multiple scans of the data. In this paper, we propose an algorithm, Extrapolation Prefix Tree that extracts frequent itemsets using a landmark windowing scheme. The algorithm uses a prefix tree structure to store arriving transactions, but unlike previous approaches estimates the structure of the tree in the next block of data based on the arrival pattern of items appearing in transactions that arrive in the current block. Our experimentation shows that Extrapolation-Tree significantly outperforms the CP-Tree, both in terms of the number of updates and the execution time required to keep the tree current while maintaining a compact tree.