Improved methods for extracting frequent itemsets from interim-support trees

  • Authors:
  • F. Coenen;P. Leng;A. Pagourtzis;W. Rytter;D. Souliou

  • Affiliations:
  • Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, U.K.;Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, U.K.;School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Zografou, Athens, Greece;Institute of Informatics, University of Warsaw, Poland and Department of Mathematics and Informatics, Copernicus University, Torun, Poland;School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Zografou, Athens, Greece

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In this paper we describe two algorithms for completing the calculation of frequent sets using a tree structure for storing partial supports, called interim-support (IS) tree. The first of our algorithms (T-Tree-First (TTF)) uses a novel tree pruning technique, based on the notion of (fixed-prefix) potential inclusion, which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the IS tree in a space-efficient manner. The second algorithm (P-Tree-First (PTF)) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree (T-tree); the main innovation lies in the use of multiple pointers per node, which provides rapid access to the nodes of the T-tree and makes it possible to design a new, usually faster, method for updating them. Experimental comparison shows that these techniques result in considerable speedup for both algorithms compared with earlier approaches that also use IS trees (Principles of Data Mining and Knowledge Discovery, Proceedings of the 5th European Conference, PKDD, 2001, Freiburg, September 2001 (Lecture Notes in Artificial Intelligence, vol. 2168). Springer: Berlin, Heidelberg, 54–66; Journal of Knowledge-Based Syst. 2000; 13:141–149). Further comparison between the two new algorithms, shows that the PTF is generally faster on instances with a large number of frequent itemsets, provided that they are relatively short, whereas TTF is more appropriate whenever there exist few or quite long frequent itemsets; in addition, TTF behaves well on instances in which the densities of the items of the database have a high variance. Copyright © 2008 John Wiley & Sons, Ltd.