Fast phonetic/lexical searching in the archives of the Czech holocaust testimonies: advancing towards the MALACH project visions

Authors:
Josef Psutka;Jan ývec;Josef V. Psutka;Jan Vaněk;Aleý Pražák;Luboý ýmídl
Affiliations:
Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic
Venue:
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Year:
2010

Citing 4
Cited 0

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Refinement Approach for Adaptation Based on Combination of MAP and fMLLR

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Discriminative Training of Gender-Dependent Acoustic Models

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Benefit of proper language processing for Czech speech retrieval in the CL-SR task at CLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe the system for a fast phonetic/lexical searching in the large archives of the Czech holocaust testimonies. The developed system is the first step to a fulfillment of the MALACH project visions [1, 2], at least as for an easier and faster access to the Czech part of the archives. More than one thousand hours of spontaneous, accented and highly emotional speech of Czech holocaust survivors stored at the USC Shoah Foundation Institute as videointerviews were automatically transcribed and phonetically/lexically indexed. Special attention was paid to processing of colloquial words that appear very frequently in the Czech spontaneous speech. The final access to the archives is very fast allowing to detect segments of interviews containing pronounced words, clusters of words presented in pre-defined time intervals, and also words that were not included in the working vocabulary (OOV words).