Similarity measures for sequential data

Authors:
Konrad Rieck
Affiliations:
Machine Learning Group, Technische Universität Berlin, Berlin, Germany
Venue:
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Year:
2011

Citing 19
Cited 2

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Making large-scale support vector machine learning practical

Advances in kernel methods
A vector space model for automatic indexing

Communications of the ACM
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A new discriminative kernel from probabilistic models

Neural Computation
Using Text Categorization Techniques for Intrusion Detection

Proceedings of the 11th USENIX Security Symposium
Text classification using string kernels

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Kernels and Distances for Structured Data

Machine Learning
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Mismatch string kernels for discriminative protein classification

Bioinformatics
ARTS

Bioinformatics
Intrusion detection using sequences of system calls

Journal of Computer Security
Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
Approximate Tree Kernels

The Journal of Machine Learning Research
Graph Kernels

The Journal of Machine Learning Research
n-Gram Statistics for Natural Language Understanding and Text Processing

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Fuzzy clustering of human activity patterns

Fuzzy Sets and Systems
Trajectory pattern change analysis in campus WiFi networks

Proceedings of the Second ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Expressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag-of-word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 296–304 DOI: 10.1002/widm.36