Fast Kernel methods for SVM sequence classifiers

Authors:
Pavel Kuksa;Vladimir Pavlovic
Affiliations:
Department of Computer Science, Rutgers University, Piscataway, NJ;Department of Computer Science, Rutgers University, Piscataway, NJ
Venue:
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Year:
2007

Citing 4
Cited 0

Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Weighted decomposition kernels

ICML '05 Proceedings of the 22nd international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we study string kernel methods for sequence analysis and focus on the problem of species-level identification based on short DNA fragments known as barcodes. We introduce efficient sorting-based algorithms for exact string k-mer kernels and then describe a divide-and-conquer technique for kernels with mismatches. Our algorithms for mismatch kernel matrix computations improve currently known time bounds for these computations. We then consider the mismatch kernel problem with feature selection, and present efficient algorithms for it. Our experimental results show that, for string kernels with mismatches, kernel matrices can be computed 100-200 times faster than traditional approaches. Kernel vector evaluations on new sequences show similar computational improvements. On several DNA barcode datasets, k-mer string kernels considerably improve identification accuracy compared to prior results. String kernels with feature selection demonstrate competitive performance with substantially fewer computations.