Toward a Fundamental Theory of Optimal Feature Selection: Part I

  • Authors:
  • Salvatore D. Morgera;Lokesh Datta

  • Affiliations:
  • Department of Electrical Engineering, Concordia University, Montreal, P.Q., Canada.;Department of Electrical Engineering, Concordia University, Montreal, P.Q., Canada.

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 1984

Quantified Score

Hi-index 0.14

Visualization

Abstract

Several authors have studied the problem of dimensionality reduction or feature selection using statistical distance measures, e.g., the Chernoff coefficient, Bhattacharyya distance, I-divergence, and J-divergence because they generally felt that direct use of the probability of classification error expression was either computationally or mathematically intractable. We show that for the difficult problem of testing one weakly stationary Gaussian stochastic process against another when the mean vectors are similar and the covariance matrices (patterns) differ, the probability of error expression may be dealt with directly using a combination of classical methods and distribution function theory. The results offer a new and accurate finite dimensionality information-theoretic strategy to feature selection, and are shown, by use of examples, to be superior to the well-known Kadota-Shepp approach which employs distance measures and asymptotics in its formulation. The present Part I deals with the theory; Part II deals with the implementation of a computer-based real-time pattern classifier which takes into account a realistic quasi-stationarity of the patterns.