Optimal robust classifiers

  • Authors:
  • Edward R. Dougherty;Jiangping Hua;Zixiang Xiong;Yidong Chen

  • Affiliations:
  • Department of Electrical Engineering, Texas A&M University, 3128 TAMU, College Station, TX 77843-3128, USA and Department of Pathology, University of Texas, M. D. Anderson Cancer Center, Houston, ...;Department of Electrical Engineering, Texas A&M University, 3128 TAMU, College Station, TX 77843-3128, USA;Department of Electrical Engineering, Texas A&M University, 3128 TAMU, College Station, TX 77843-3128, USA;National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-2152, USA

  • Venue:
  • Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Qualitatively, a filter is said to be ''robust'' if its performance degradation is acceptable for distributions close to the one for which it is optimal, that is, the one for which it has been designed. This paper adapts the signal-processing theory of optimal robust filters to classifiers. The distribution (class conditional distributions) to which the classifier is to be applied is parameterized by a state vector and the principle issue is to choose a design state that is optimal in comparison to all other states relative to some measure of robustness. A minimax robust classifier is one whose worst performance over all states is better than the worst performances of the other classifiers (defined at the other states). A Bayesian robust classifier is one whose expected performance is better than the expected performances of the other classifiers. The state corresponding to the Bayesian robust classifier is called the maximally robust state. Minimax robust classifiers tend to give too much weight to states for which classification is very difficult and therefore our effort is focused on Bayesian robust classifiers. Whereas the signal-processing theory of robust filtering concentrates on design with full distributional knowledge and a fixed number of observation variables (features), design via training from sample data and feature selection are so important for classification that robustness optimality must be considered from these perspectives-in particular, for small samples. In this context, for a given sample size, we will be concerned with the maximally robust state-feature pair. All definitions are independent of the classification rule; however, applications are only considered for linear and quadratic discriminant analysis, for which there are parametric forms for the optimal discriminants.