An information-theoretic approach to quantitative association rule mining

Authors:
Yiping Ke;James Cheng;Wilfred Ng
Affiliations:
The Hong Kong University of Science and Technology, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong;The Hong Kong University of Science and Technology, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong;The Hong Kong University of Science and Technology, Department of Computer Science and Engineering, Clear Water Bay, Kowloon, Hong Kong
Venue:
Knowledge and Information Systems
Year:
2008

Citing 30
Cited 7

Elements of information theory

Elements of information theory
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Association rules over interval data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Data Mining with optimized two-dimensional association rules

ACM Transactions on Database Systems (TODS)
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An evolutionary algorithm to discover numeric association rules

Proceedings of the 2002 ACM symposium on Applied computing
Multipass algorithms for mining association rules in text databases

Knowledge and Information Systems
Introduction to Algorithms

Introduction to Algorithms
Mining Optimized Association Rules with Categorical and Numeric Attributes

IEEE Transactions on Knowledge and Data Engineering
A Statistical Theory for Quantitative Association Rules

Journal of Intelligent Information Systems
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Mining Optimized Gain Rules for Numeric Attributes

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Numeric Association Rules via Evolutionary Algorithm

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
On the discovery of significant statistical quantitative rules

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Quantitative Association Rules Based on Half-Spaces: An Optimization Approach

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Mathematical Theory of Communication

A Mathematical Theory of Communication
Quantitative Association Rules Mining Methods with Privacy-preserving

PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Multiple labels associative classification

Knowledge and Information Systems
Novel approach to optimize quantitative association rules by employing multi-objective genetic algorithm

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
MIC Framework: An Information-Theoretic Approach to Quantitative Association Rule Mining

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Constraining and summarizing association rules in medical data

Knowledge and Information Systems
An Algorithm for Privacy-Preserving Quantitative Association Rules Mining

DASC '06 Proceedings of the 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing
QuantMiner: a genetic algorithm for mining quantitative association rules

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
An algorithm to mine general association rules from tabular data

Information Sciences: an International Journal
Mining dynamic association rules with comments

Knowledge and Information Systems
Mining fuzzy association rules from uncertain data

Knowledge and Information Systems
Mining frequent patterns from univariate uncertain data

Data & Knowledge Engineering
Mining numerical association rules via multi-objective genetic algorithms

Information Sciences: an International Journal
Optimal leverage association rules with numerical interval conditions

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas the QARs that are not returned by MIC are shown to be less interesting.