Computational Biology and Chemistry
Predicting protein subcellular locations for Gram-negative bacteria using neural networks ensemble
CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Coding of amino acids by texture descriptors
Artificial Intelligence in Medicine
Computational Biology and Chemistry
Computational Biology and Chemistry
Artificial Intelligence in Medicine
Computational Biology and Chemistry
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Computer Methods and Programs in Biomedicine
Wavelet Analysis in Current Cancer Genome Research: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 3.84 |
Motivation: With protein sequences entering into databanks at an explosive pace, the early determination of the family or subfamily class for a newly found enzyme molecule becomes important because this is directly related to the detailed information about which specific target it acts on, as well as to its catalytic process and biological function. Unfortunately, it is both time-consuming and costly to do so by experiments alone. In a previous study, the covariant-discriminant algorithm was introduced to identify the 16 subfamily classes of oxidoreductases. Although the results were quite encouraging, the entire prediction process was based on the amino acid composition alone without including any sequence-order information. Therefore, it is worthy of further investigation. Results: To incorporate the sequence-order effects into the predictor, the 'amphiphilic pseudo amino acid composition' is introduced to represent the statistical sample of a protein. The novel representation contains 20 + 2λ discrete numbers: the first 20 numbers are the components of the conventional amino acid composition; the next 2λ numbers are a set of correlation factors that reflect different hydrophobicity and hydrophilicity distribution patterns along a protein chain. Based on such a concept and formulation scheme, a new predictor is developed. It is shown by the self-consistency test, jackknife test and independent dataset tests that the success rates obtained by the new predictor are all significantly higher than those by the previous predictors. The significant enhancement in success rates also implies that the distribution of hydrophobicity and hydrophilicity of the amino acid residues along a protein chain plays a very important role to its structure and function. Contact: kchou@san.rr.com