Sampling ensembles for frequent patterns

Authors:
Caiyan Jia;Ruqian Lu
Affiliations:
Lab of Intelligent Information Processing, Institute of Computing Technology, Academia Sinica, Beijing, China;Lab of Intelligent Information Processing, Institute of Computing Technology, Academia Sinica, Beijing, China
Venue:
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Year:
2005

Citing 14
Cited 1

A theory of the learnable

Communications of the ACM
An efficient algorithm for sequential random sampling

ACM Transactions on Mathematical Software (TOMS)
Bagging predictors

Machine Learning
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Efficient Progressive Sampling for Association Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient data reduction with EASE

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-scaling sampling: an adaptive sampling method for discovering approximate association rules

Journal of Computer Science and Technology
Monte Carlo theory as an explanation of bagging and boosting

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

A popular solution to improving the speed and scalability of association rule mining is to do the algorithm on a random sample instead of the entire database. But it is at the expense of the accuracy of answers. In this paper, we present a sampling ensemble approach to improve the accuracy for a given sample size. Then, using Monte Carlo theory, we give an explanation for a sampling ensemble and obtain the theoretically low bound of sample size to ensure the feasibility and validity of an ensemble. And for learning the origination of the sample error and therefore giving theoretical guidance for obtaining more accurate answers, bias-variance decomposition is used in analyzing the sample error of an ensemble. According to theoretical analysis and real experiments, we conclude that sampling ensemble method can not only significantly improve the accuracy of answers, but also be a new means to solve the difficulty of determining appropriate sample size needed.