An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Proceedings of the sixth annual international conference on Computational biology
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
Using the Fisher Kernel Method to Detect Remote Protein Homologies
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Text classification using string kernels
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Introduction: Special issue on neural networks and kernel methods for structured domains
Neural Networks - Special issue on neural networks and kernel methods for structured domains
Structural alignment based kernels for protein structure classification
Proceedings of the 24th international conference on Machine learning
Comparing SVM sequence kernels: a protein subcellular localization theme
WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Length-weighted string kernels for sequence data classification
Pattern Recognition Letters
Data-Dependent Kernel Machines for Microarray Data Classification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sequence kernels for predicting protein essentiality
Proceedings of the 25th international conference on Machine learning
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
The Impact of Noise in Spam Filtering: A Case Study
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
A Unified String Kernel for Biology Sequence
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Improved Online Support Vector Machines Spam Filtering Using String Kernels
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
A study of spam filtering using support vector machines
Artificial Intelligence Review
A composite kernel for named entity recognition
Pattern Recognition Letters
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter
Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient algorithms for similarity measures over sequential data: a look beyond kernels
DAGM'06 Proceedings of the 28th conference on Pattern Recognition
Prediction of the human papillomavirus risk types using gap-spectrum kernels
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
A fast bit-parallel algorithm for gapped string kernels
ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
2D similarity kernels for biological sequence classification
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Similarity measures for sequential data
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Locality kernels for protein classification
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Fast Kernel methods for SVM sequence classifiers
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
New empirical nonparametric kernels for support vector machine classification
Applied Soft Computing
Classifying Proteins by Amino Acid Variations of Sequential Patterns
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Characterizing Amino Acid Variations of Scavenger Receptors by Class Information Gain
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Protein function prediction by integrating multiple kernels
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
An introduction to string re-writing kernel
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Biological Sequence Classification with Multivariate String Kernels
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We describe several families of k-mer based string kernels related to the recently presented mismatch kernel and designed for use with support vector machines (SVMs) for classification of protein sequence data. These new kernels -- restricted gappy kernels, substitution kernels, and wildcard kernels -- are based on feature spaces indexed by k-length subsequences ("k-mers") from the string alphabet Σ. However, for all kernels we define here, the kernel value K(x,y) can be computed in O(cK(|x|+|y|)) time, where the constant cK depends on the parameters of the kernel but is independent of the size |Σ| of the alphabet. Thus the computation of these kernels is linear in the length of the sequences, like the mismatch kernel, but we improve upon the parameter-dependent constant cK = km+1|Σ|m of the (k,m)-mismatch kernel. We compute the kernels efficiently using a trie data structure and relate our new kernels to the recently described transducer formalism. In protein classification experiments on two benchmark SCOP data sets, we show that our new faster kernels achieve SVM classification performance comparable to the mismatch kernel and the Fisher kernel derived from profile hidden Markov models, and we investigate the dependence of the kernels on parameter choice.