Investigations on Stability and Overoptimism of Classification Trees by Using Cross-Validation

Authors:
Willi Sauerbrei
Affiliations:
-
Venue:
ISMDA '01 Proceedings of the Second International Symposium on Medical Data Analysis
Year:
2001

Citing 2
Cited 0

Bagging predictors

Machine Learning
Neural networks and logistic regression: Part I

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Development of classification rules is often based on tree methodology. Using data from a diagnostic study where Doppler flow signals were measured to separate between malignant and benign breast tumors I will discuss issues of searching for the cutpoint of continuous variables with a minimal p-value and the necessity to correct this p-value because of multiple testing. Ignoring the correction will strongly favor continuous variables in tree development and may lead to useless trees. I will further investigate the influence of the complexity of a tree by estimating the overoptimism as the difference from the apparent error rates based on the original data to estimated error rates based on 5-fold crossvalidation. Furthermore I consider the use of predefined cutpoint on the development of trees and the resulting error rates.