Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus

  • Authors:
  • Niels Jensen;Thomas Mandl

  • Affiliations:
  • University of Hildesheim, Hildesheim, Germany;University of Hildesheim, Hildesheim, Germany

  • Venue:
  • Proceedings of the seventeenth conference on Hypertext and hypermedia
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Experiments with a multi-lingual web collection are presented. The EuroGOV corpus is the first multi-lingual web corpus for retrieval evaluation. We show how indexes based on words and n-rams are developed for different document parts. Different indexes werde based on the full document content, partial content and the title. The best results were achieved for a title only index based on words.