Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Advances in kernel methods: support vector learning
Advances in kernel methods: support vector learning
Text classification using string kernels
The Journal of Machine Learning Research
A survey of kernels for structured data
ACM SIGKDD Explorations Newsletter
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
RNA string kernels for RNAi off-target evaluation
International Journal of Bioinformatics Research and Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiple kernel support vector regression for siRNA efficacy prediction
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Hi-index | 0.00 |
String kernels directly model sequence similarities without the necessity of extracting numerical features in a vector space. Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is an important biological mechanism with many therapeutical applications, where strings can be used to represent target messenger RNAs and initiating short RNAs and string kernels can be applied for learning and prediction. However, existing string kernels are not particularly developed for RNA applications. Moreover, most existing string kernels are n-gram based and suffer from high dimensionality and inability of preserving subsequence orderings. We propose a randomized string kernel for use with support vector regression with a purpose of better predicting silencing efficacy scores for the candidate sequences and eventually improving the efficiency of biological experiments. We show the positive definiteness of this kernel and give an analysis of randomization error rates. Empirical results on biological data demonstrate that the proposed kernel performed better than existing string kernels and achieved significant improvements over kernels computed from numerical descriptors extracted according to structural and thermodynamic rules. In addition, it is computationally more efficient.