Using quantitative information for efficient association rule generation

  • Authors:
  • B. Pôssas;M. Carvalho;R. Resende;W. Meita, Jr.

  • Affiliations:
  • Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of mining association rules in categorical data presented in customer transactions was introduced by Agrawal, Imielinski and Swami [2]. This seminal work gave birth to several investigation efforts [4, 13] resulting in descriptions of how to extend the original concepts and how to increase the performance of the related algorithms.The original problem of mining association rules was formulated as how to find rules of the form set1 → set2. This rule is supposed to denote affinity or correlation among the two sets containing nominal or ordinal data items. More specifically, such an association rule should translate the following meaning: customers that buy the products in set1 also buy the products in set2. Statistical basis is represented in the form of minimum support and confidence measures of these rules with respect to the set of customer transactions.The original problem as proposed by Agrawal et al. [2] was extended in several directions such as adding or replacing the confidence and support by other measures, or filtering the rules during or after generation, or including quantitative attributes. Srikant e Agrawal [16] describe an new approach where quantitative data can be treated as categorical. This is very important since otherwise part of the customer transaction information is discarded. Whenever an extension is proposed it must be checked in terms of its performance. The algorithm efficiency is linked to the size of the database that is amenable to be treated. Therefore it is crucial to have efficient algorithms that enable us to examine and extract valuable decision-making information in the ever larger databases.In this paper we present an algorithm that can be used in the context of several of the extensions provided in the literature but at the same time preserves its performance, as demonstrated in a case study. The approach in our algorithm is to explore multidimensional properties of the data (provided such properties are present), allowing us to combine this additional information in a very efficient pruning phase. This results in a very flexible and efficient algorithm that was used with success in several experiments using categorical and quantitative databases.The paper is organized as follows. In the next section we describe the quantitative association rules and we present an algorithm to generate it. Section 3 presents an optimization of the pruning phase of the Apriori [4] algorithm based on quantitative information associated with the items. Section 4 presents our experimental results for mining four synthetic workloads, followed by some related work in Section 5. Finally we present some conclusions and future work in Section 6.