Supporting access to large digital oral history archives
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
IEEE Transactions on Audio, Speech, and Language Processing
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Hi-index | 0.00 |
In this paper we describe the initial stages of the ASR component of the MALACH (Multilingual Access to Large Spoken Archives) project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation (VHF) by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is neccesary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present a baseline speech recognition results.