Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization

  • Authors:
  • Chung-Chih Lin;Yuh-Show Tsai;Yu-Shi Lin;Tai-Yu Chiu;Chia-Cheng Hsiung;May-I. Lee;Jeremy C. Simpson;Chun-Nan Hsu

  • Affiliations:
  • -;-;-;-;-;-;-;-

  • Venue:
  • Bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Determining locations of protein expression is essential to understand protein function. Advances in green fluorescence protein (GFP) fusion proteins and automated fluorescence microscopy allow for rapid acquisition of large collections of protein localization images. Recognition of these cell images requires an automated image analysis system. Approaches taken by previous work concentrated on designing a set of optimal features and then applying standard machine-learning algorithms. In fact, trends of recent advances in machine learning and computer vision can be applied to improve the performance. One trend is the advances in multiclass learning with error-correcting output codes (ECOC). Another trend is the use of a large number of weak detectors with boosting for detecting objects in images of real-world scenes. Results: We take advantage of these advances to propose a new learning algorithm, AdaBoost.ERC, coupled with weak and strong detectors, to improve the performance of automatic recognition of protein subcellular locations in cell images. We prepared two image data sets of CHO and Vero cells and downloaded a HeLa cell image data set in the public domain to evaluate our new method. We show that AdaBoost.ERC outperforms other AdaBoost extensions. We demonstrate the benefit of weak detectors by showing significant performance improvements over classifiers using only strong detectors. We also empirically test our method's capability of generalizing to heterogeneous image collections. Compared with previous work, our method performs reasonably well for the HeLa cell images. Availability: CHO and Vero cell images, their corresponding feature sets (SSLF and WSLF), our new learning algorithm, AdaBoost.ERC, and Supplementary Material are available at http://aiia.iis.sinica.edu.tw/ Contact: chunnan@iis.sinica.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online.