Information retrieval test collection for searching spontaneous Czech speech

  • Authors:
  • Pavel Ircing;Pavel Pecina;Douglas W. Oard;Jianqiang Wang;Ryen W. White;Jan Hoidekr

  • Affiliations:
  • University of West Bohemia, Faculty of Applied Sciences, Department of Cybernetics, Plzeň, Czech Republic;Charles University, Institute of Formal and Applied Linguistic, Praha, Czech Republic;University of Maryland, College of Information Studies, UMIACS, College Park, MD;State University of New York at Buffalo, Department of Library and Information Studies, Buffalo, NY;Microsoft Research, Redmond, WA;University of West Bohemia, Faculty of Applied Sciences, Department of Cybernetics, Plzeň, Czech Republic

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design of the first large-scale IR test collection built for the Czech language. The creation of this collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries. All aspects of the collection building are presented, together with some general findings of initial experiments.