UB at CLEF2004: cross language information retrieval using statistical language models

  • Authors:
  • Miguel E. Ruiz;Munirathnam Srikanth

  • Affiliations:
  • School of Informatics, Dept of Library and Information Studies, State University of New York at Buffalo, Buffalo, NY;Language Computer Corporation, Richardson, TX

  • Venue:
  • CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the results of the State University of New York at Buffalo (UB) in the Mono-lingual and Multi-lingual tasks at CLEF 2004. For these tasks we used an approach based on statistical language modeling. Our Adhoc retrieval work used the TAPIR toolkit developed in house by M Srikanth. Our approach focused on the validation and adaptation of the language model system to work in a multilingual environment and in exploring ways to merge results from multiple collections into a single list of results. We explored the use of a measure of query ambiguity, also known as clarity score, for merging results of the individual collections into a single list of retrieved documents. Our results indicate that the use of clarity scores normalized across queries gives statistically significant improvements over using a fixed merging order.