Reducing rule covers with deterministic error bounds

Authors:
Vikram Pudi;Jayant R. Haritsa
Affiliations:
Database Systems Lab, SERC, Indian Institute of Science, Bangalore, India;Database Systems Lab, SERC, Indian Institute of Science, Bangalore, India
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 12
Cited 1

Finding interesting rules from large sets of discovered association rules

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Generation of Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
On the Efficiency of Association-Rule Mining Algorithms

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Constraint-Based Rule Mining in Large, Dense Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Mining Bases for Association Rules Using Closed Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Constraining and summarizing association rules in medical data

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload by pruning "uninteresting" rules following the observation that most rules can be derived from other rules. In this paper, we propose a new framework, namely, the generalized closed (or g-closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms (Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements on a variety of databases.