Biological Sequence Classification with Multivariate String Kernels

Authors:
Pavel P. Kuksa
Affiliations:
NEC Laboratories America Inc, Princeton
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2013

Citing 14
Cited 0

Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Semi-supervised protein classification using cluster kernels

Bioinformatics
Large scale genomic sequence SVM classifiers

ICML '05 Proceedings of the 22nd international conference on Machine learning
A machine learning information retrieval approach to protein fold recognition

Bioinformatics
Multi-class Protein Classification Using Adaptive Codes

The Journal of Machine Learning Research
Protein homology detection with biologically inspired features and interpretable statistical models

International Journal of Data Mining and Bioinformatics
An Automated Combination of Kernels for Predicting Protein Subcellular Localization

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Spatial Representation for Efficient Sequence Classification

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Bounded coordinate-descent for biological sequence classification in high dimensional predictor space

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.