Investigations on Stability and Overoptimism of Classification Trees by Using Cross-Validation

  • Authors:
  • Willi Sauerbrei

  • Affiliations:
  • -

  • Venue:
  • ISMDA '01 Proceedings of the Second International Symposium on Medical Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Development of classification rules is often based on tree methodology. Using data from a diagnostic study where Doppler flow signals were measured to separate between malignant and benign breast tumors I will discuss issues of searching for the cutpoint of continuous variables with a minimal p-value and the necessity to correct this p-value because of multiple testing. Ignoring the correction will strongly favor continuous variables in tree development and may lead to useless trees. I will further investigate the influence of the complexity of a tree by estimating the overoptimism as the difference from the apparent error rates based on the original data to estimated error rates based on 5-fold crossvalidation. Furthermore I consider the use of predefined cutpoint on the development of trees and the resulting error rates.