Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Boosting a weak learning algorithm by majority
Information and Computation
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Mining optimized gain rules for numeric attributes
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On approximating rectangle tiling and packing
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Algorithms for the maximum subarray problem based on matrix multiplication
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Programming pearls: algorithm design techniques
Communications of the ACM
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On Classification and Regression
DS '98 Proceedings of the First International Conference on Discovery Science
The strength of weak learnability
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
ACM SIGKDD Explorations Newsletter
Partial least squares regression for graph mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A fast boosting-based learner for feature-rich tagging and chunking
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Subtree mining for question classification problem
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving subtree-based question classification classifiers with word-cluster models
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Hi-index | 0.00 |
This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as single-test decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses efficiently. In this paper, we consider several classes of simple but expressive hypotheses such as ranges and regions for numeric attributes, subsets of categorical values, and conjunctions of Boolean tests. For each class, we develop an efficient algorithm for choosing the optimal hypothesis.