Audio Partitioning and Transcription for Broadcast Data Indexation

  • Authors:
  • J. L. Gauvain;L. Lamel;G. Adda

  • Affiliations:
  • Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay, France.gauvain@limsi.fr;Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay, France.lamel@limsi.fr;Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay, France.gadda@limsi.fr

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work addresses automatic transcription of television and radio broadcasts in multiple languages. Transcription of such types of data is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Radio and television broadcasts consist of a continuous data stream made up of segments of different linguistic and acoustic natures, which poses challenges for transcription. Prior to word recognition, the data is partitioned into homogeneous acoustic segments. Non-speech segments are identified and removed, and the speech segments are clustered and labeled according to bandwidth and gender. Word recognition is carried out with a speaker-independent large vocabulary, continuous speech recognizer which makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. This system has consistently obtained top-level performance in DARPA evaluations. Over 500 hours of unpartitioned unrestricted American English broadcast data have been partitioned, transcribed and indexed, with an average word error of about 20%. With current IR technology there is essentially no degradation in information retrieval performance for automatic and manual transcriptions on this data set.