Error rates for classifying observations based on binary and continuous variables with covariates
Computational Statistics & Data Analysis
Non-parametric smoothing of the location model in mixed variable discrimination
Statistics and Computing
Journal of Multivariate Analysis
Hi-index | 0.00 |
Classification of mixed categorical and continuous data is often performed using the location linear discriminant function which assumes across-location homoscedasticity. In this paper, we investigate the hazard arising from a routine application of the classifier under across-location heteroscedasticity. A limiting and a first-order asymptotic performance index are proposed and studied in a general setting. The first index studies the limiting behavior. The second index corrects the bias due to the finite sample size. Both indexes are illustrated under the assumption of unequal spherical covariance matrices across all the locations. This is likely to be the case in most classification problems dealing with mixed categorical and continuous data. Results of a numerical study are reported.