A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Elements of information theory
Elements of information theory
Fundamentals of speech recognition
Fundamentals of speech recognition
String searching algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
IEEE Transactions on Pattern Analysis and Machine Intelligence
The String-to-String Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Proceedings of the sixth annual international conference on Computational biology
Hidden Markov Models for Speech Recognition
Hidden Markov Models for Speech Recognition
Introduction to Algorithms
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Computation of Normalized Edit Distances
IEEE Transactions on Pattern Analysis and Machine Intelligence
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
A new paradigm for ranking pages on the world wide web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Text classification using string kernels
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Fastest Mixing Markov Chain on a Graph
SIAM Review
Efficient Computation of Gapped Substring Kernels on Large Alphabets
The Journal of Machine Learning Research
Protein homology detection using string alignment kernels
Bioinformatics
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Randomized shortest-path problems: Two related models
Neural Computation
Learning to align: a statistical approach
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
The Sum-over-Paths Covariance Kernel: A Novel Covariance Measure between Nodes of a Directed Graph
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Journal of Machine Learning Research
Optimal tuning of continual online exploration in reinforcement learning
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
CONTRAlign: discriminative training for protein sequence alignment
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
IEEE Transactions on Information Theory
The entropy of Markov trajectories
IEEE Transactions on Information Theory
Pattern Recognition Letters
Hi-index | 0.01 |
This paper introduces a simple Sum-over-Paths (SoP) formulation of string edit distances accounting for all possible alignments between two sequences, and extends related previous work from bioinformatics to the case of graphs with cycles. Each alignment @?, with a total cost C(@?), is assigned a probability of occurrence P(@?)=exp[-@qC(@?)]/Z where Z is a normalization factor. Therefore, good alignments (having a low cost) are favored over bad alignments (having a high cost). The expected cost @?"@?"@?"PC(@?)exp[-@qC(@?)]/Z computed over all possible alignments @?@?P defines the SoP edit distance. When @q-~, only the best alignments matter and the measure reduces to the standard edit distance. The rationale behind this definition is the following: for some applications, two sequences sharing many good alignments should be considered as more similar than two sequences having only one single good, optimal, alignment in common. In other words, sub-optimal alignments could also be taken into account. Forward/backward recurrences allowing to efficiently compute the expected cost are developed. Virtually any Viterbi-like sequence comparison algorithm computed on a lattice can be generalized in the same way; for instance, a SoP longest common subsequence is also developed. Pattern classification tasks performed on five data sets show that the new measures usually outperform the standard ones and, in any case, never perform significantly worse, at the expense of tuning the parameter @q.