Splice site prediction using support vector machines with a Bayes kernel

Authors:
Ya Zhang;Chao-Hsien Chu;Yixin Chen;Hongyuan Zha;Xiang Ji
Affiliations:
School of Information Sciences and Technology, 301 K IST Building, Pennsylvania State University, University Park, PA 16802, USA;School of Information Sciences and Technology, 301 K IST Building, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA;Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA;NEC Laboratories America, Cupertino, CA 95014, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2006

Citing 5
Cited 2

Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A tutorial on support vector regression

Statistics and Computing
KSPF: using gene sequence patterns and data mining for biological knowledge management

Expert Systems with Applications: An International Journal
Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

Expert Systems with Applications: An International Journal

Splice sites prediction of Human genome using length-variable Markov model and feature selection

Expert Systems with Applications: An International Journal
SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference

Journal of Biomedical Informatics

Quantified Score

Hi-index	12.05

Visualization

Abstract

One of the most important tasks in correctly annotating genes in higher organisms is to accurately locate the DNA splice sites. Although relatively high accuracy has been achieved by existing methods, most of these prediction methods are computationally extensive. Due to the enormous amount of DNA sequences to be processed, the computational speed is an important issue to consider. In this paper, we present a new machine learning method for predicting DNA splice sites, which first applies a Bayes feature mapping (kernel) to project the data into a new feature space and then uses a linear Support Vector Machine (SVM) as a classifier to recognize the true splice sites. The computation time is linear to the number of sequences tested, while the performance is notably improved compared with the Naive Bayes classifier in terms of classification accuracy, precision, and recall. Our classification results are also comparable to the solution quality obtained by the SVMs with polynomial kernels, while the speed of our proposed method is significantly faster. This is a notable improvement in computational modeling considering the huge amount of DNA sequences to be processed.