Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus

Authors:
Niels Jensen;Thomas Mandl
Affiliations:
University of Hildesheim, Hildesheim, Germany;University of Hildesheim, Hildesheim, Germany
Venue:
Proceedings of the seventeenth conference on Hypertext and hypermedia
Year:
2006

Citing 4
Cited 0

Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Template detection for large scale search engines

Proceedings of the 2006 ACM symposium on Applied computing
Implementation and evaluation of a quality-based search engine

Proceedings of the seventeenth conference on Hypertext and hypermedia
Web retrieval experiments with the EuroGOV corpus at the university of hildesheim

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experiments with a multi-lingual web collection are presented. The EuroGOV corpus is the first multi-lingual web corpus for retrieval evaluation. We show how indexes based on words and n-rams are developed for different document parts. Different indexes werde based on the full document content, partial content and the title. The best results were achieved for a title only index based on words.