A lattice-based approach for I/O efficient association rule mining

  • Authors:
  • K. K. Loo;Chi Lap Yip;Ben Kao;David Cheung

  • Affiliations:
  • Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong

  • Venue:
  • Information Systems - Databases: Creation, management and utilization
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago, Chile, 1994, pp. 487-499). One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the biggest frequent itemsets. In this paper, we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We present an algorithm FindLarge which uses LGen to find frequent itemsets. We show that, given a reasonable set of suggested frequent itemsets, FindLarge can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the frequent itemsets irrespective of the size of the biggest ones.Two I/O-saving algorithms, namely DIC and Pincher-Search, are compared with FindLarge in a series of experiments. We discuss the conditions under which FindLarge significantly outperforms the others in terms of I/O efficiency.