A Unified String Kernel for Biology Sequence
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Rapid sequence homology assessment by subsampling the genome space using difference sets
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Classifying proteins using gapped Markov feature pairs
Neurocomputing
Protein remote homology detection based on auto-cross covariance transformation
Computers in Biology and Medicine
Computers in Biology and Medicine
Hi-index | 3.84 |
Motivation: Remote homology detection is among the most intensively researched problems in bioinformatics. Currently discriminative approaches, especially kernel-based methods, provide the most accurate results. However, kernel methods also show several drawbacks: in many cases prediction of new sequences is computationally expensive, often kernels lack an interpretable model for analysis of characteristic sequence features, and finally most approaches make use of so-called hyperparameters which complicate the application of methods across different datasets. Results: We introduce a feature vector representation for protein sequences based on distances between short oligomers. The corresponding feature space arises from distance histograms for any possible pair of K-mers. Our distance-based approach shows important advantages in terms of computational speed while on common test data the prediction performance is highly competitive with state-of-the-art methods for protein remote homology detection. Furthermore the learnt model can easily be analyzed in terms of discriminative features and in contrast to other methods our representation does not require any tuning of kernel hyperparameters. Availability: Normalized kernel matrices for the experimental setup can be downloaded at www.gobics.de/thomas. Matlab code for computing the kernel matrices is available upon request. Contact:thomas@gobics.de, peter@gobics.de