Web retrieval experiments with the EuroGOV corpus at the university of hildesheim

  • Authors:
  • Niels Jensen;René Hackl;Thomas Mandl;Robert Strötgen

  • Affiliations:
  • Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany

  • Venue:
  • CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes web retrieval experiments with the EuroGOV corpus carried out at the University of Hildesheim. For both the multi-lingual and the mixed mono-lingual task, several indexing strategies were tested, all of them based on one mixed language index. After stopword removal, word and n-gram based indexes were developed based on the full document content, part of the content and the document title. Boosting the original topic language with a higher weight in the query and punishing the English translation led to better results for most settings. A title only run gave the best results during post submission runs for the multi-lingual task.