Voting fuzzy k-NN to predict protein subcellular localization from normalized amino acid pair compositions

  • Authors:
  • Thai Quang Tung;Doheon Lee;Dae-Won Kim;Jong-Tae Lim

  • Affiliations:
  • Department of BioSystems, KAIST, Daejeon, Korea;Department of BioSystems, KAIST, Daejeon, Korea;Department of BioSystems, KAIST, Daejeon, Korea;Department of Computer Engineering, Kongju National University, Chungcheonam-do, Korea

  • Venue:
  • PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are a huge number of protein sequences in databanks whose functions are not known. Since the biological functions of these proteins are closely correlated with their subcellular localization, it is important to develop a system to automatically predict subcellular localization from sequences for large-scale genome analysis. In this paper, we first propose a new formula to estimate the composition of amino acid pairs for feature extraction, and then we present a voting scheme that combines a set of fuzzy k-nearest-neighbor (k-NN) classifiers to predict subcellular locations. In order to detect sequence-order features, individual classifier is constructed using different types of features, including amino acid and amino acid pair compositions. We apply our method to several datasets and significant improvements are achieved.