2D similarity kernels for biological sequence classification

Authors:
Pavel P. Kuksa
Affiliations:
NEC Laboratories America, Inc, Princeton, NJ
Venue:
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Year:
2012

Citing 9
Cited 2

Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Semi-supervised protein classification using cluster kernels

Bioinformatics
Large scale genomic sequence SVM classifiers

ICML '05 Proceedings of the 22nd international conference on Machine learning
A machine learning information retrieval approach to protein fold recognition

Bioinformatics
Multi-class Protein Classification Using Adaptive Codes

The Journal of Machine Learning Research
Spatial Representation for Efficient Sequence Classification

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition

Classifying Proteins by Amino Acid Variations of Sequential Patterns

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Characterizing Amino Acid Variations of Scavenger Receptors by Class Information Gain

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on tasks such as document topic elucidation, biological sequence classification, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete 1D string data (e.g., DNA or amino acid sequences). This work introduces new 2D kernel methods for sequence data in the form of sequences of feature vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors). On three protein sequence classification tasks proposed 2D kernels show significant 15-20% improvements compared to state-of-the-art sequence classification methods.