UNED@CL-SR CLEF 2005: mixing different strategies to retrieve automatic speech transcriptions

Authors:
Fernando López-Ostenero;Víctor Peinado;Valentín Sama;Felisa Verdejo
Affiliations:
NLP Group, ETSI Informática, UNED, Madrid, Spain;NLP Group, ETSI Informática, UNED, Madrid, Spain;NLP Group, ETSI Informática, UNED, Madrid, Spain;NLP Group, ETSI Informática, UNED, Madrid, Spain
Venue:
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Year:
2005

Citing 3
Cited 1

The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
UNED at ImageCLEF 2005: automatically structured queries with named entities over metadata

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Overview of the CLEF-2005 cross-language speech retrieval track

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Overview of the CLEF-2006 cross-language speech retrieval track

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe UNED’s participation in the CLEF CL-SR 2005 track. First, we explain how we tried several strategies to clean up the automatic transcriptions. Then, we describe how we performed 84 different runs mixing these strategies with named entity recognition and different pseudo-relevance feedback approaches, in order to study the influence of each method in the retrieval process, both in monolingual and cross-lingual environments. We noticed that the influence of named entity recognition was higher in the cross-lingual environment, where MAP scores double when we take advantage of an entity recognizer. The best pseudo-relevance feedback approach was the one using manual keywords. The effects of the different cleaning strategies were very similar, except for character 3-grams, which obtained poor scores compared with other approaches.