C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Decision Tree Induction Based on Efficient Tree Restructuring
Machine Learning
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
Pessimistic decision tree pruning based Continuous-time
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Option Decision Trees with Majority Votes
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Data Mining using MLC++, A Machine Learning Library in C++
ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
A Study on End-Cut Preference in Least Squares Regression Trees
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Optimistic pruning for multiple instance learning
Pattern Recognition Letters
Exploitation of 3D stereotactic surface projection for predictive modelling of Alzheimer's disease
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.