Using quantitative information for efficient association rule generation

Authors:
B. Pôssas;M. Carvalho;R. Resende;W. Meita, Jr.
Affiliations:
Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte - MG - Brazil
Venue:
ACM SIGMOD Record
Year:
2000

Citing 14
Cited 2

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Association rules over interval data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical theory for quantitative association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Multidimensional binary search trees used for associative searching

Communications of the ACM
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Constraint-Based Rule Mining in Large, Dense Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research

Unified descriptive language for association rules in data mining

Second international workshop on Intelligent systems design and application
Discovering search engine related queries using association rules

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of mining association rules in categorical data presented in customer transactions was introduced by Agrawal, Imielinski and Swami [2]. This seminal work gave birth to several investigation efforts [4, 13] resulting in descriptions of how to extend the original concepts and how to increase the performance of the related algorithms.The original problem of mining association rules was formulated as how to find rules of the form set1 → set2. This rule is supposed to denote affinity or correlation among the two sets containing nominal or ordinal data items. More specifically, such an association rule should translate the following meaning: customers that buy the products in set1 also buy the products in set2. Statistical basis is represented in the form of minimum support and confidence measures of these rules with respect to the set of customer transactions.The original problem as proposed by Agrawal et al. [2] was extended in several directions such as adding or replacing the confidence and support by other measures, or filtering the rules during or after generation, or including quantitative attributes. Srikant e Agrawal [16] describe an new approach where quantitative data can be treated as categorical. This is very important since otherwise part of the customer transaction information is discarded. Whenever an extension is proposed it must be checked in terms of its performance. The algorithm efficiency is linked to the size of the database that is amenable to be treated. Therefore it is crucial to have efficient algorithms that enable us to examine and extract valuable decision-making information in the ever larger databases.In this paper we present an algorithm that can be used in the context of several of the extensions provided in the literature but at the same time preserves its performance, as demonstrated in a case study. The approach in our algorithm is to explore multidimensional properties of the data (provided such properties are present), allowing us to combine this additional information in a very efficient pruning phase. This results in a very flexible and efficient algorithm that was used with success in several experiments using categorical and quantitative databases.The paper is organized as follows. In the next section we describe the quantitative association rules and we present an algorithm to generate it. Section 3 presents an optimization of the pruning phase of the Apriori [4] algorithm based on quantitative information associated with the items. Section 4 presents our experimental results for mining four synthetic workloads, followed by some related work in Section 5. Finally we present some conclusions and future work in Section 6.