A Novel Method for Prediction of Protein Domain Using Distance-Based Maximal Entropy

Authors:
Shuxue Zou;Yanxin Huang;Yan Wang;Chengquan Hu;Yanchun Liang;Chunguang Zhou
Affiliations:
College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, 130012, China
Venue:
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Year:
2007

Citing 4
Cited 0

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatic prediction of protein domains from sequence information using a hybrid learning system

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting the boundaries of protein domains has been an important and challenging problem in experimental and computational structural biology. In this paper the domain detection is first taken as an imbalanced data learning problem. A novel undersampling method using distance-based maximal entropy in the feature space of SVMs is proposed. On multiple sequence alignments that are derived from a database search, multiple measures are defined to quantify the domain information content of each position along the sequence. The overall accuracy is about 87% together with high sensitivity and specificity. Simulation results demonstrate that the utility of the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general imbalanced datasets.