The Effect of Instance-Space Partition on Significance

Authors:
Jeffrey P. Bradford;Carla E. Brodley
Affiliations:
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. jeffrey.bradford@computer.org;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. brodley@ecn.purdue.edu
Venue:
Machine Learning
Year:
2001

Citing 9
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Decision Tree Induction Based on Efficient Tree Restructuring

Machine Learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
Pessimistic decision tree pruning based Continuous-time

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Option Decision Trees with Majority Votes

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Data Mining using MLC++, A Machine Learning Library in C++

ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

A Study on End-Cut Preference in Least Squares Regression Trees

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Optimistic pruning for multiple instance learning

Pattern Recognition Letters
Exploitation of 3D stereotactic surface projection for predictive modelling of Alzheimer's disease

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.