IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Extraction From Wavelet Coefficients for Pattern Recognition Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Prediction of protein subcellular localizations using moment descriptors and support vector machine
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Hi-index | 0.00 |
The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multifeatures classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.