Using a mixture of probabilistic decision trees for direct prediction of protein function
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An introduction to variable and feature selection
The Journal of Machine Learning Research
Motif Extraction and Protein Classification
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Hi-index | 0.00 |
Sequence-derived structural and physicochemical features have been used to develop models for predicting protein families. Here, we test the hypothesis that high-level functional groups of proteins may be classified by a very small set of global features directly extracted from sequence alone. To test this, we represent each protein using a small number of normalized global sequence features and classify them into functional groups, using support vector machines (SVM). Furthermore, the contribution of specific subsets of features to the classification quality is thoroughly investigated. The representation of proteins using global features provides effective information for protein family classification, with comparable results to those obtained by representation using local sequence alignment scores. Furthermore, a combination of global and local sequence features significantly improves classification performance.