Predicting protein-peptide binding affinity by learning peptide-peptide distance functions

  • Authors:
  • Chen Yanover;Tomer Hertz

  • Affiliations:
  • School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel;School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel

  • Venue:
  • RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many important cellular response mechanisms are activated when a peptide binds to an appropriate receptor. In the immune system, the recognition of pathogen peptides begins when they bind to cell membrane Major Histocompatibility Complexes (MHCs). MHC proteins then carry these peptides to the cell surface in order to allow the activation of cytotoxic T-cells. The MHC binding cleft is highly polymorphic and therefore protein-peptide binding is highly specific. Developing computational methods for predicting protein-peptide binding is important for vaccine design and treatment of diseases like cancer. Previous learning approaches address the binding prediction problem using traditional margin based binary classifiers. In this paper we propose a novel approach for predicting binding affinity. Our approach is based on learning a peptide-peptide distance function. Moreover, we learn a single peptide-peptide distance function over an entire family of proteins (e.g MHC class I). This distance function can be used to compute the affinity of a novel peptide to any of the proteins in the given family. In order to learn these peptide-peptide distance functions, we formalize the problem as a semi-supervised learning problem with partial information in the form of equivalence constraints. Specifically we propose to use DistBoost [1, 2], which is a semi-supervised distance learning algorithm. We compare our method to various state-of-the-art binding prediction algorithms on MHC class I and MHC class II datasets. In almost all cases, our method outperforms all of its competitors. One of the major advantages of our novel approach is that it can also learn an affinity function over proteins for which only small amounts of labeled peptides exist. In these cases, DistBoost's performance gain, when compared to other computational methods, is even more pronounced.