Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

Authors:
Xiao Wang;Guo-Zheng Li
Affiliations:
Tongji University, Shanghai and Zhengzhou University of Light Industry, Zhengzhou;Tongji University, Shanghai and China Academy of Chinese Medical Sciences, Beijing
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2013

Citing 15
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Knowledge Discovery in Multi-label Phenotype Data

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Prediction of protein subcellular locations using fuzzy k-NN method

Bioinformatics
Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes

Bioinformatics
Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier

Pattern Recognition Letters
Bioinformatics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Multitask Learning for Protein Subcellular Location Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Subcellular Localization Prediction through Boosting Association Rules

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition

Computational Biology and Chemistry
Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.