Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles

  • Authors:
  • Manuel Martín-Merino;Javier Las Rivas

  • Affiliations:
  • Universidad Pontificia de Salamanca, Salamanca, Spain 37002;Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain

  • Venue:
  • IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k Nearest Neighbor classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, k -NN relies usually on the use of Euclidean distances that fail often to reflect accurately the sample proximities. Non Euclidean dissimilarities focus on different features of the data and should be integrated in order to reduce the misclassification errors. In this paper, we learn a linear combination of dissimilarities using a regularized kernel alignment algorithm. The weights of the combination are learnt in a HRKHS (Hyper Reproducing Kernel Hilbert Space) using a Semidefinite Programming algorithm. This approach allow us to incorporate a smoothing term that penalizes the complexity of the family of distances and avoids overfitting. The experimental results suggest that the method proposed outperforms other metric learning strategies and improves the classical k -NN algorithm based on a single dissimilarity.