Similarity boosting for label noise tolerance in protein-chemical interaction prediction

  • Authors:
  • Aaron Smalter Hall;Jun Huan;Gerald Lushington

  • Affiliations:
  • University of Kansas, Lawrence, Kansas;University of Kansas, Lawrence, Kansas;University of Kansas, Lawrence, Kansas

  • Venue:
  • Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The analysis of protein-chemical reactions on a large scale is critical to understanding the complex interrelated mechanisms that govern biological life at the cellular level. Chemical proteomics is a new research area aimed at proteome-wide screening of such chemical-protein interactions. In order to model the diverse and complex chemical-protein interaction space, recent work on local models has emerged. Local models improve generalization by training a series of independent models each localized to predict a single interaction. One limitation of this approach is that the localized models are not tolerant to noise in the interaction labels, which is a characteristic of much protein-chemical interaction data. This work proposes and evaluates a boosting framework incorporating sample similarity to localize base models to appropriate regions of the interaction space, thereby ensuring that similar samples are given similar predictions and providing a measure of tolerance to noise in the training labels. The framework is described and compared to local models and several other competing classification methods. Chemical-protein interaction data sets are constructed from publicly available data, and a series of cross-validation experiments are performed in order to compare the noise tolerance, accuracy, sensitivity, and specificity of various methods.