A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
An introduction to variable and feature selection
The Journal of Machine Learning Research
Finding short DNA motifs using permuted markov models
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Selective Markov models for predicting Web page accesses
ACM Transactions on Internet Technology (TOIT)
Markov Encoding for Detecting Signals in Genomic Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Combined SVM-Based Feature Selection and Classification
Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Large Scale Multiple Kernel Learning
The Journal of Machine Learning Research
Mining longest repeating subsequences to predict world wide web surfing
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Bioinformatics
Splice site prediction using support vector machines with a Bayes kernel
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Splice sites define the boundaries of exonic regions and dictate protein synthesis and function. The splicing mechanism involves complex interactions among positional and compositional features of different lengths. Computational modeling of the underlying constructive information is especially challenging, in order to decipher splicing-inducing elements and alternative splicing factors. SpliceIT (Splice Identification Technique) introduces a hybrid method for splice site prediction that couples probabilistic modeling with discriminative computational or experimental features inferred from published studies in two subsequent classification steps. The first step is undertaken by a Gaussian support vector machine (SVM) trained on the probabilistic profile that is extracted using two alternative position-dependent feature selection methods. In the second step, the extracted predictions are combined with known species-specific regulatory elements, in order to induce a tree-based modeling. The performance evaluation on human and Arabidopsis thaliana splice site datasets shows that SpliceIT is highly accurate compared to current state-of-the-art predictors in terms of the maximum sensitivity, specificity tradeoff without compromising space complexity and in a time-effective way. The source code and supplementary material are available at: http://www.med.auth.gr/research/spliceit/.