Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast String Kernels using Inexact Matching for Protein Sequences
The Journal of Machine Learning Research
Multi-class protein fold recognition using adaptive codes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Introduction: Special issue on neural networks and kernel methods for structured domains
Neural Networks - Special issue on neural networks and kernel methods for structured domains
Neural Networks - Special issue on neural networks and kernel methods for structured domains
Theoretical Computer Science
Functional Census of Mutation Sequence Spaces: The Example of p53 Cancer Rescue Mutants
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Protein classification using transductive learning on phylogenetic profiles
Proceedings of the 2006 ACM symposium on Applied computing
Classifying protein sequences using hydropathy blocks
Pattern Recognition
Comparing SVM sequence kernels: a protein subcellular localization theme
WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Typing Staphylococcus aureus Using the spa Gene and Novel Distance Measures
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
RNA string kernels for RNAi off-target evaluation
International Journal of Bioinformatics Research and Applications
A generalization of Haussler's convolution kernel: mapping kernel
Proceedings of the 25th international conference on Machine learning
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
A Unified String Kernel for Biology Sequence
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Computational Biology and Chemistry
Neural networks letter: LAGO on the unit sphere
Neural Networks
g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Computers in Biology and Medicine
A Class of Evolution-Based Kernels for Protein Homology Analysis: A Generalization of the PAM Model
ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A randomized string kernel and its application to RNA interference
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
On prediction using variable order Markov models
Journal of Artificial Intelligence Research
Human activity encoding and recognition using low-level visual features
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Protein Structure Classification Based on Conserved Hydrophobic Residues
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Coding of amino acids by texture descriptors
Artificial Intelligence in Medicine
Mining Cytochrome b561 proteins from plant genomes
International Journal of Bioinformatics Research and Applications
Learning state machine-based string edit kernels
Pattern Recognition
Protein remote homology detection based on binary profiles
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Learning actions using robust string kernels
Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation
Prediction of alternatively spliced exons using Support Vector Machines
International Journal of Data Mining and Bioinformatics
Classifying proteins using gapped Markov feature pairs
Neurocomputing
A composite kernel for named entity recognition
Pattern Recognition Letters
A Study of Hierarchical and Flat Classification of Proteins
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Large-scale support vector learning with structural kernels
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Using local alignments for relation recognition
Journal of Artificial Intelligence Research
Journal of Computer Science and Technology
Protein remote homology detection based on auto-cross covariance transformation
Computers in Biology and Medicine
Computers in Biology and Medicine
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Accuracy of string kernels for protein sequence classification
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Efficient algorithms for similarity measures over sequential data: a look beyond kernels
DAGM'06 Proceedings of the 28th conference on Pattern Recognition
SVM based prediction of bacterial transcription start sites
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Prediction of the human papillomavirus risk types using gap-spectrum kernels
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
Human papillomavirus risk type classification from protein sequences using support vector machines
EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Computational and statistical methods in bioinformatics
AM'03 Proceedings of the Second international conference on Active Mining
Classification of chromosome sequences with entropy kernel and LKPLS algorithm
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
String kernels of imperfect matches for off-target detection in RNA interference
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
A solution to the curse of dimensionality problem in pairwise scoring techniques
ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Classification of biological sequences with kernel methods
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Transactions on Computational Systems Biology II
Efficient target detection for RNA interference
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Prediction of human proteins interacting with human papillomavirus proteins
ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
A hidden Markov model variant for sequence classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Efficient evaluation of large sequence kernels
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Transductive multi-label ensemble classification for protein function prediction
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Protein function prediction using weak-label learning
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Similarity measures for sequential data
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Locality kernels for protein classification
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
A family of feed-forward models for protein sequence classification
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Evolving fisher kernels for biological sequence classification
Evolutionary Computation
Classifying Proteins by Amino Acid Variations of Sequential Patterns
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Characterizing Amino Acid Variations of Scavenger Receptors by Class Information Gain
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
The gapped spectrum kernel for support vector machines
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Protein Function Prediction using Multi-label Ensemble Classification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Remote homology detection incorporating the context of physicochemical properties
Computers in Biology and Medicine
Hi-index | 3.84 |
Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. Results: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. Availability: SVM software is publicly available at http://microarray.cpmc.columbia.edu/gist. Mismatch kernel software is available upon request.