Speeding up HMM decoding and training by exploiting sequence repetitions

Authors:
Shay Mozes;Oren Weimann;Michal Ziv-Ukelson
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
Venue:
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Year:
2007

Citing 15
Cited 3

Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
An improved algorithm for computing the edit distance of run-length coded strings

Information Processing Letters
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
Algorithmic aspects in speech recognition: an introduction

Journal of Experimental Algorithmics (JEA)
Matching for run-length encoded strings

Journal of Complexity
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A sub-quadratic sequence alignment algorithm for unrestricted cost matrices

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate String Matching over Ziv-Lempel Compressed Text

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate Matching of Run-Length Compressed Strings

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression

CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Faster Approximate String Matching over Compressed Text

DCC '01 Proceedings of the Data Compression Conference
All-pairs shortest paths with real weights in O(n3/ log n) time

WADS'05 Proceedings of the 9th international conference on Algorithms and Data Structures
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory
On the Complexity of Finite Sequences

IEEE Transactions on Information Theory

CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

The Journal of Machine Learning Research
Efficient staggered decoding for sequence labeling

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Speeding up Bayesian HMM by the four Russians method

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi's decoding and training algorithms [21], as well as to the forward-backward and Baum-Welch [4] algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. We describe three algorithms based alternatively on byte pair encoding (BPE) [19], run length encoding (RLE) and Lempel-Ziv (LZ78) parsing [22]. Compared to Viterbi's algorithm, we achieve a speedup of Ω(r) using BPE, a speedup of Ω(r/log r ) using RLE, and a speedup of Ω(log n/k) using LZ78, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. Furthermore, unlike Viterbi's algorithm, our algorithms are highly parallelizable.