Using the Fisher Kernel Method to Detect Remote Protein Homologies
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Rational Kernels: Theory and Algorithms
The Journal of Machine Learning Research
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast String Kernels using Inexact Matching for Protein Sequences
The Journal of Machine Learning Research
Semi-supervised protein classification using cluster kernels
Bioinformatics
Large scale genomic sequence SVM classifiers
ICML '05 Proceedings of the 22nd international conference on Machine learning
Multi-class Protein Classification Using Adaptive Codes
The Journal of Machine Learning Research
Protein homology detection with biologically inspired features and interpretable statistical models
International Journal of Data Mining and Bioinformatics
An Automated Combination of Kernels for Predicting Protein Subcellular Localization
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Spatial Representation for Efficient Sequence Classification
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.