Building an information retrieval test collection for spontaneous conversational speech
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
First experiments searching spontaneous Czech speech
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Corrective models for speech recognition of inflected languages
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Experiments with Automatic Query Formulation in the Extended Boolean Model
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Penalty functions for evaluation measures of unsegmented speech retrieval
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Hi-index | 0.00 |
This paper describes the design of the first large-scale IR test collection built for the Czech language. The creation of this collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries. All aspects of the collection building are presented, together with some general findings of initial experiments.