Subcellular Localization Prediction through Boosting Association Rules

Authors:
Yongwook Yoon;Gary Geunbae Lee
Affiliations:
Pohang University of Science and Technology, Pohang;Pohang University of Science and Technology, Pohang
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 14
Cited 2

The Strength of Weak Learnability

Machine Learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Making use of the most expressive jumping emerging patterns for classification

Knowledge and Information Systems
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent-subsequence-based prediction of outer membrane proteins

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting subcellular localization of proteins using machine-learned classifiers

Bioinformatics
Predicting subcellular localization of proteins in a hybridization space

Bioinformatics
MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition

Bioinformatics
SherLoc

Bioinformatics
Clustal W and Clustal X version 2.0

Bioinformatics
Text Categorization Based on Boosting Association Rules

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing

Synergistic combination of clinical and imaging features predicts abnormal imaging patterns of pulmonary infections

Computers in Biology and Medicine
Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short k-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark data sets show that our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the k-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.