The nature of statistical learning theory
The nature of statistical learning theory
Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity
Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Subsequence-based feature map for protein function classification
Computational Biology and Chemistry
Grammatical inference in practice: a case study in the biomedical domain
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Proceedings of the 27th Annual ACM Symposium on Applied Computing
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to aSmith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.