Mining association rules in very large clustered domains

  • Authors:
  • Alexandros Nanopoulos;Apostolos N. Papadopoulos;Yannis Manolopoulos

  • Affiliations:
  • Department of Informatics, Aristotle University, 54124, Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124, Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124, Thessaloniki, Greece

  • Venue:
  • Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emerging applications introduce the requirement for novel association-rule mining algorithms that will be scalable not only with respect to the number of records (number of rows) but also with respect to the domain's size (number of columns). In this paper, we focus on the cases where the items of a large domain correlate with each other in a way that small worlds are formed, that is, the domain is clustered into groups with a large number of intra-group and a small number of inter-group correlations. This property appears in several real-world cases, e.g., in bioinformatics, e-commerce applications, and bibliographic analysis, and can help to significantly prune the search space so as to perform efficient association-rule mining. We develop an algorithm that partitions the domain of items according to their correlations and we describe a mining algorithm that carefully combines partitions to improve the efficiency. Our experiments show the superiority of the proposed method against existing algorithms, and that it overcomes the problems (e.g., increase in CPU cost and possible I/O thrashing) caused by existing algorithms due to the combination of a large domain and a large number of records.