Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

  • Authors:
  • Josef Psutka;Pavel Ircing;Josef V. Psutka;Vlasta Radová;William J. Byrne;Jan Hajic;Samuel Gustman;Bhuvana Ramabhadran

  • Affiliations:
  • -;-;-;-;-;-;-;-

  • Venue:
  • TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe the initial stages of the ASR component of the MALACH (Multilingual Access to Large Spoken Archives) project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation (VHF) by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is neccesary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present a baseline speech recognition results.