Machine Learning
Neural networks and logistic regression: Part I
Computational Statistics & Data Analysis
Hi-index | 0.00 |
Development of classification rules is often based on tree methodology. Using data from a diagnostic study where Doppler flow signals were measured to separate between malignant and benign breast tumors I will discuss issues of searching for the cutpoint of continuous variables with a minimal p-value and the necessity to correct this p-value because of multiple testing. Ignoring the correction will strongly favor continuous variables in tree development and may lead to useless trees. I will further investigate the influence of the complexity of a tree by estimating the overoptimism as the difference from the apparent error rates based on the original data to estimated error rates based on 5-fold crossvalidation. Furthermore I consider the use of predefined cutpoint on the development of trees and the resulting error rates.