N-best rescoring based on pitch-accent patterns

  • Authors:
  • Je Hun Jeon;Wen Wang;Yang Liu

  • Affiliations:
  • The University of Texas at Dallas;Speech Technology and Research Laboratory, SRI International;The University of Texas at Dallas

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount of data and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.