Web retrieval experiments with the EuroGOV corpus at the university of hildesheim

Authors:
Niels Jensen;René Hackl;Thomas Mandl;Robert Strötgen
Affiliations:
Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany;Information Science, Universität Hildesheim, Hildesheim, Germany
Venue:
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Year:
2005

Citing 7
Cited 4

A stemming procedure and stopword list for general French corpora

Journal of the American Society for Information Science
Monolingual Document Retrieval for European Languages

Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Language identification: a solved problem suitable for undergraduate instruction

Journal of Computing Sciences in Colleges
Evaluation of a language identification system for mono- and multilingual text documents

Proceedings of the 2006 ACM symposium on Applied computing
The quest to find the best pages on the web

Information Services and Use
Mono- and crosslingual retrieval experiments at the university of hildesheim

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Implementation and evaluation of a quality-based search engine

Proceedings of the seventeenth conference on Hypertext and hypermedia
Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus

Proceedings of the seventeenth conference on Hypertext and hypermedia
Mixed monolingual homepage finding in 34 languages: the role of language script and search domain

Information Retrieval
Multilingual web retrieval experiments with field specific indexing strategies for WebCLEF 2006 at the University of Hildesheim

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes web retrieval experiments with the EuroGOV corpus carried out at the University of Hildesheim. For both the multi-lingual and the mixed mono-lingual task, several indexing strategies were tested, all of them based on one mixed language index. After stopword removal, word and n-gram based indexes were developed based on the full document content, part of the content and the document title. Boosting the original topic language with a higher weight in the query and punishing the English translation led to better results for most settings. A title only run gave the best results during post submission runs for the multi-lingual task.