Error detection in broadcast news ASR using Markov chains

  • Authors:
  • Thomas Pellegrini;Isabel Trancoso

  • Affiliations:
  • INESC-ID, LISBON, Portugal;INESC-ID, LISBON, Portugal

  • Venue:
  • LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article addresses error detection in broadcast news automatic transcription, as a post-processing stage. Based on the observation that many errors appear in bursts, we investigated the use of Markov Chains (MC) for their temporal modelling capabilities. Experiments were conducted on a large Amercian English broadcast news corpus from NIST. Common features in error detection were used, all decoder-based. MC classification performance was compared with a discriminative maximum entropy model (Maxent), currently used in our in-house decoder to estimate confidence measures, and also with Gaussian Mixture Models (GMM). The MC classifier obtained the best results, by detecting 16.2% of the errors, with the lowest classification error rate of 16.7%. To be compared with the GMM classifier, MC allowed to lower the number of false detections, by 23.5% relative. The Maxent system achieved the same CER, but detected only 7.2% of the errors.