Feature extraction using clustering of protein

  • Authors:
  • Isis Bonet;Yvan Saeys;Ricardo Grau Ábalo;María M. García;Robersy Sanchez;Yves Van de Peer

  • Affiliations:
  • Center of Studies on Informatics, Central University of Las Villas, Santa Clara, Villa Clara, Cuba;Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Belgium;Center of Studies on Informatics, Central University of Las Villas, Santa Clara, Villa Clara, Cuba;Center of Studies on Informatics, Central University of Las Villas, Santa Clara, Villa Clara, Cuba;Research Institute of Tropical Roots, Tuber Crops and Banana (INIVIT), Biotechnology Group, Santo Domingo, Villa Clara, Cuba;Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Belgium

  • Venue:
  • CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we investigate the usage of a clustering algorithm as a feature extraction technique to find new features to represent the protein sequence. In particular, our work focuses on the prediction of HIV protease resistance to drugs. We use a biologically motivated similarity function based on the contact energy of the amino acid and the position in the sequence. The performance measure was computed taking into account the clustering reliability and the classification validity. An SVM using 10-fold crossvalidation and the k-means algorithm were used for classification and clustering respectively. The best results were obtained by reducing an initial set of 99 features to a lower dimensional feature set of 36-66 features.