Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament

  • Authors:
  • Luís Neves;Ciro Martins;Hugo Meinedo;João Neto

  • Affiliations:
  • L2F --- Spoken Language Systems Lab, INESC-ID/IST, Lisboa, Portugal 1000-029;L2F --- Spoken Language Systems Lab, INESC-ID/IST, Lisboa, Portugal 1000-029 and Department Electronics, Telecomunications & Informatics/IEETA, Aveiro University, Portugal;L2F --- Spoken Language Systems Lab, INESC-ID/IST, Lisboa, Portugal 1000-029;L2F --- Spoken Language Systems Lab, INESC-ID/IST, Lisboa, Portugal 1000-029

  • Venue:
  • PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main goal of this work is the adaptation of a broadcast news transcription system to a new domain, namely, the Portuguese Parliament plenary meetings. This paper describes the different domain adaptation steps that lowered our baseline absolute word error rate from 20.1% to 16.1%. These steps include the vocabulary selection, in order to include specific domain terms, language model adaptation, by interpolation of several different models, and acoustic model adaptation, using an unsupervised confidence based approach.