Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Making large-scale support vector machine learning practical
Advances in kernel methods
A vector space model for automatic indexing
Communications of the ACM
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A new discriminative kernel from probabilistic models
Neural Computation
Using Text Categorization Techniques for Intrusion Detection
Proceedings of the 11th USENIX Security Symposium
Text classification using string kernels
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Kernels and Distances for Structured Data
Machine Learning
Fast String Kernels using Inexact Matching for Protein Sequences
The Journal of Machine Learning Research
Bioinformatics
Intrusion detection using sequences of system calls
Journal of Computer Security
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
The Journal of Machine Learning Research
The Journal of Machine Learning Research
n-Gram Statistics for Natural Language Understanding and Text Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
Fuzzy clustering of human activity patterns
Fuzzy Sets and Systems
Trajectory pattern change analysis in campus WiFi networks
Proceedings of the Second ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems
Hi-index | 0.00 |
Expressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag-of-word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 296–304 DOI: 10.1002/widm.36