A segment-based interpretation of HMM/ANN hybrids

  • Authors:
  • László Tóth;András Kocsor

  • Affiliations:
  • Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and the University of Szeged, Szeged, Aradi vértanúk tere 1, H-6720 Szeged, Hungary;Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and the University of Szeged, Szeged, Aradi vértanúk tere 1, H-6720 Szeged, Hungary

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Here we seek to understand the similarities and differences between two speech recognition approaches, namely the HMM/ANN hybrid and the posterior-based segmental model. Both these techniques create local posterior probability estimates and combine these estimates into global posteriors - but they are built on somewhat different concepts and mathematical derivations. The HMM/ANN hybrid combines the local estimates via a formulation that is inherited from the generative HMM concept, while the components of the segment-based model correspond quite directly to the two subtasks of phonetic decoding: segmentation and classification. In this paper we attempt to identify the corresponding components of the segmental model within the hybrid model, with the intent of gaining an insight from this unusual point of view. As regards one of these components, the segment-based phone posteriors, we show that the independence-based product rule combination applied in the hybrid produces strongly biased estimates. As for the other component, the segmentation probability factor, we argue that it is present in the hybrid thanks to the bias of the product rule - that is, the product rule goes wrong in such a special way that it helps the model find the best segmentation of the input. To prove this assertion, we combine this bias with the posterior estimates obtained by averaging, and find that the resulting 'averaging hybrid' slightly outperforms the standard one on a phone recognition task and a word recognition task as well. Overall we conclude that the contribution of the product rule to the decoding process is just as important for the segmentation subtask as it is for the segment classification subtask.