From fisher's linear discriminant analysis to NLDA or the story of the solution of a very difficult nonlinear classification problem

  • Authors:
  • José Barahona Da Fonseca

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, New University of Lisbon, Caparica, Portugal

  • Venue:
  • SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work deals with pattern classification of single pap-smear cells from an existing database developed on Herlev University Hospital [1]-[2] with 917 cells characterized by 20 numerical features and classified over 7 classes by Human experts. Medical, the method can be used for detecting pre-malignant cells in uterine cervix before the progress into cancer. Available cell features like area, position and brightness of nucleus and cytoplasm are used for the classification into normal and abnormal cells. We began to solve this problem with a modified Kohonnen neural network that took into account the classification errors, but even after long hours of fine tuning of a set of parameters we only got 66.7% of good classifications. Then using Fisher's linear discriminant analysis we also got a similar result, 66.8% of good classifications. So we reached the conclusion that our classification problem is nonlinear and that our modified Kohonen network was essentially equivalent to LDA. Then we implement NLDA with a very simple feedforward neural network and after only 50 epochs of training with BP and varying the number of sigmoidal neurons in the first hidden layer we got a surprising result of 98.3% of good classifications in the best of five successive runs of BP over 50 epochs with random weights initialization and 60 sigmoidal neurons in the first hidden layer. Next we formatted the input data such that all variables have unit variance and we obtained 99.1% of good classifications after 1,000 epochs of training and forcing also zero mean in all variables we got an even better result of 99.8%, i.e. 2 errors in 917 classifications. Finally we compare our solution to recent works and our implementation of NLDA to more sophisticated neural networks that also approximate LDA.