Speeding up HMM decoding and training by exploiting sequence repetitions

  • Authors:
  • Shay Mozes;Oren Weimann;Michal Ziv-Ukelson

  • Affiliations:
  • MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel

  • Venue:
  • CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi's decoding and training algorithms [21], as well as to the forward-backward and Baum-Welch [4] algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. We describe three algorithms based alternatively on byte pair encoding (BPE) [19], run length encoding (RLE) and Lempel-Ziv (LZ78) parsing [22]. Compared to Viterbi's algorithm, we achieve a speedup of Ω(r) using BPE, a speedup of Ω(r/log r ) using RLE, and a speedup of Ω(log n/k) using LZ78, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. Furthermore, unlike Viterbi's algorithm, our algorithms are highly parallelizable.