A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs

Authors:
Haifeng Li;Tao Jiang
Affiliations:
University of California, Riverside, CA;University of California, Riverside, CA
Venue:
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Year:
2004

Citing 9
Cited 6

Algorithms for approximate string matching

Information and Control
An O(NP) sequence comparison algorithm

Information Processing Letters
Fast text searching: allowing errors

Communications of the ACM
The nature of statistical learning theory

The nature of statistical learning theory
An extension of Ukkonen's enhanced dynamic programming ASM algorithm

ACM Transactions on Information Systems (TOIS)
Detecting non-adjoining correlations with signals in DNA

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology

Translation Initiation Sites Prediction with Mixture Gaussian Models in Human cDNA Sequences

IEEE Transactions on Knowledge and Data Engineering
Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Learning state machine-based string edit kernels

Pattern Recognition
High efficiency on prediction of translation initiation site (TIS) of RefSeq sequences

BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
A class of new kernels based on high-scored pairs of k-peptides for SVMs and its application for prediction of protein subcellular localization

Transactions on Computational Systems Biology II
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of new sequence-similarity kernels based on string edit, called the edit kernels, for use with support vector machines (SVMs) in a discriminative approach to predict TISs. The edit kernels are simple and have significant biological and probabilistic interpretations. Second, we convert the region of an input mRNA sequence downstream to a putative TIS into an amino acid sequence before applying SVMs to avoid the high redundancy in the genetic code. The algorithm has been implemented and tested on previously published data. Our experimental results on real mRNA data show that both ideas improve the prediction accuracy greatly and our method performs significantly better than those based on neural networks and SVMs with polynomial kernels or Salzberg kernel.