Large scale genomic sequence SVM classifiers

Authors:
Sören Sonnenburg;Gunnar Rätsch;Bernhard Schölkopf
Affiliations:
Fraunhofer Institute FIRST, Berlin, Germany;Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany;Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 8
Cited 8

Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Proceedings of the sixth annual international conference on Computational biology
A new discriminative kernel from probabilistic models

Neural Computation
Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

3DString: a feature string kernel for 3D object classification on voxelized data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research
On Relevant Dimensions in Kernel Feature Spaces

The Journal of Machine Learning Research
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
Detecting adversarial advertisements in the wild

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient evaluation of large sequence kernels

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
2D similarity kernels for biological sequence classification

Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Biological Sequence Classification with Multivariate String Kernels

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modifications of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.