Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles

  • Authors:
  • Nico Pfeifer;Oliver Kohlbacher

  • Affiliations:
  • Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard Karls University Tübingen, Tübingen, Germany 72076;Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard Karls University Tübingen, Tübingen, Germany 72076

  • Venue:
  • WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Human adaptive immune response relies on the recognition of short peptides through proteins of the major histocompatibility complex (MHC). MHC class II molecules are responsible for the recognition of antigens external to a cell. Understanding their specificity is an important step in the design of peptide-based vaccines. The high degree of polymorphism in MHC class II makes the prediction of peptides that bind (and then usually cause an immune response) a challenging task. Typically, these predictions rely on machine learning methods, thus a sufficient amount of data points is required. Due to the scarcity of data, currently there are reliable prediction models only for about 7% of all known alleles available.We show how to transform the problem of MHC class II binding peptide prediction into a well-studied machine learning problem called multiple instance learning. For alleles with sufficient data, we show how to build a well-performing predictor using standard kernels for multiple instance learning. Furthermore, we introduce a new method for training a classifier of an allele without the necessity for binding allele data of the target allele. Instead, we use binding peptide data from other alleles and similarities between the structures of the MHC class II alleles to guide the learning process. This allows for the first time constructing predictors for about two thirds of all known MHC class II alleles. The average performance of these predictors on 14 test alleles is 0.71, measured as area under the ROC curve.Availability:The methods are integrated into the EpiToolKit framework for which there exists a webserver at http://www.epitoolkit.org/mhciimulti