Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter
Mining optimized support rules for numeric attributes
Information Systems
On the Complexity of Mining Quantitative Association Rules
Data Mining and Knowledge Discovery
Mining Optimized Association Rules with Categorical and Numeric Attributes
IEEE Transactions on Knowledge and Data Engineering
Optimizing Disjunctive Association Rules
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Generalizing the notion of confidence
Knowledge and Information Systems
Generalization of association rules through disjunction
Annals of Mathematics and Artificial Intelligence
Hi-index | 0.00 |
The problem of finding optimized support associationrules for a single numerical attribute, where the optimizedregion is a union of k disjoint intervals from the range ofthe attribute, is investigated. The first polynomial timealgorithm for the problem of finding such a region maximizingsupport and meeting a minimum cumulative confidencethreshold is given. Because the algorithm is notpractical, an ostensibly easier, more constrained versionof the problem is considered. Experiments demonstratethat the best extant algorithm for the constrained versionhas significant performance degradation on both a syntheticmodel of patterned data and on real world data sets.Running the algorithm on a small random sample is proposedas a means of obtaining near optimal results withhigh probability. Theoretical bounds on sufficient samplesize to achieve a given performance level are proved, andrapid convergence on synthetic and real-world data is validatedexperimentally.