Optimized Disjunctive Association Rules via Sampling

Authors:
J. Elble;C. Heeren;L. Pitt
Affiliations:
-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 12
Cited 2

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Mining optimized support rules for numeric attributes

Information Systems
On the Complexity of Mining Quantitative Association Rules

Data Mining and Knowledge Discovery
Mining Optimized Association Rules with Categorical and Numeric Attributes

IEEE Transactions on Knowledge and Data Engineering
Optimizing Disjunctive Association Rules

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications

Generalizing the notion of confidence

Knowledge and Information Systems
Generalization of association rules through disjunction

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of finding optimized support associationrules for a single numerical attribute, where the optimizedregion is a union of k disjoint intervals from the range ofthe attribute, is investigated. The first polynomial timealgorithm for the problem of finding such a region maximizingsupport and meeting a minimum cumulative confidencethreshold is given. Because the algorithm is notpractical, an ostensibly easier, more constrained versionof the problem is considered. Experiments demonstratethat the best extant algorithm for the constrained versionhas significant performance degradation on both a syntheticmodel of patterned data and on real world data sets.Running the algorithm on a small random sample is proposedas a means of obtaining near optimal results withhigh probability. Theoretical bounds on sufficient samplesize to achieve a given performance level are proved, andrapid convergence on synthetic and real-world data is validatedexperimentally.