Improved methods for extracting frequent itemsets from interim-support trees

Authors:
F. Coenen;P. Leng;A. Pagourtzis;W. Rytter;D. Souliou
Affiliations:
Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, U.K.;Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, U.K.;School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Zografou, Athens, Greece;Institute of Informatics, University of Warsaw, Poland and Department of Mathematics and Informatics, Copernicus University, Torun, Poland;School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Zografou, Athens, Greece
Venue:
Software—Practice & Experience
Year:
2009

Citing 13
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Association Analysis with One Scan of Databases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Tree Structures for Mining Association Rules

Data Mining and Knowledge Discovery
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
On computing, storing and querying frequent patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Tree-based partitioning of date for association rule mining

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In this paper we describe two algorithms for completing the calculation of frequent sets using a tree structure for storing partial supports, called interim-support (IS) tree. The first of our algorithms (T-Tree-First (TTF)) uses a novel tree pruning technique, based on the notion of (fixed-prefix) potential inclusion, which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the IS tree in a space-efficient manner. The second algorithm (P-Tree-First (PTF)) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree (T-tree); the main innovation lies in the use of multiple pointers per node, which provides rapid access to the nodes of the T-tree and makes it possible to design a new, usually faster, method for updating them. Experimental comparison shows that these techniques result in considerable speedup for both algorithms compared with earlier approaches that also use IS trees (Principles of Data Mining and Knowledge Discovery, Proceedings of the 5th European Conference, PKDD, 2001, Freiburg, September 2001 (Lecture Notes in Artificial Intelligence, vol. 2168). Springer: Berlin, Heidelberg, 54–66; Journal of Knowledge-Based Syst. 2000; 13:141–149). Further comparison between the two new algorithms, shows that the PTF is generally faster on instances with a large number of frequent itemsets, provided that they are relatively short, whereas TTF is more appropriate whenever there exist few or quite long frequent itemsets; in addition, TTF behaves well on instances in which the densities of the items of the database have a high variance. Copyright © 2008 John Wiley & Sons, Ltd.