Multitask Learning for Protein Subcellular Location Prediction

Authors:
Qian Xu;Sinno Jialin Pan;Hannah Hong Xue;Qiang Yang
Affiliations:
Hong Kong University of Science and Technology, Clearwater Bay, Kowloon;Hong Kong University of Science and Technology, Clearwater Bay, Kowloon;Hong Kong University of Science and Technology, Clearwater Bay, Kowloon;Hong Kong University of Science and Technology, Clearwater Bay, Kowloon
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 0
Cited 3

Three challenges in data mining

Frontiers of Computer Science in China
Multi-target protein-chemical interaction prediction using task-regularized and boosted multi-task learning

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational methods. The location information can indicate key functionalities of proteins. Thus, accurate prediction of subcellular localizations of proteins can help the prediction of protein functions and genome annotations, as well as the identification of drug targets. Machine learning methods such as Support Vector Machines (SVMs) have been used in the past for the problem of protein subcellular localization, but have been shown to suffer from a lack of annotated training data in each species under study. To overcome this data sparsity problem, we observe that because some of the organisms may be related to each other, there may be some commonalities across different organisms that can be discovered and used to help boost the data in each localization task. In this paper, we formulate protein subcellular localization problem as one of multitask learning across different organisms. We adapt and compare two specializations of the multitask learning algorithms on 20 different organisms. Our experimental results show that multitask learning performs much better than the traditional single-task methods. Among the different multitask learning methods, we found that the multitask kernels and supertype kernels under multitask learning that share parameters perform slightly better than multitask learning by sharing latent features. The most significant improvement in terms of localization accuracy is about 25 percent. We find that if the organisms are very different or are remotely related from a biological point of view, then jointly training the multiple models cannot lead to significant improvement. However, if they are closely related biologically, the multitask learning can do much better than individual learning.