Multilingual web retrieval experiments with field specific indexing strategies for WebCLEF 2006 at the University of Hildesheim

Authors:
Ben Heuwing;Thomas Mandl;Robert Strötgen
Affiliations:
Universität Hildesheim, Information Science, Hildesheim, Germany;Universität Hildesheim, Information Science, Hildesheim, Germany;Universität Hildesheim, Information Science, Hildesheim, Germany
Venue:
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Year:
2006

Citing 4
Cited 1

Web-centric language models

Proceedings of the 14th ACM international conference on Information and knowledge management
Template detection for large scale search engines

Proceedings of the 2006 ACM symposium on Applied computing
Web retrieval experiments with the EuroGOV corpus at the university of hildesheim

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Mono- and crosslingual retrieval experiments at the university of hildesheim

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Mixed monolingual homepage finding in 34 languages: the role of language script and search domain

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experiments with the analysis and extraction of the HTML structure of web documents were carried out for WebCLEF 2006. In addition, blind relevance feedback was applied. As for WebCLEF 2005, a language independent indexing strategy was pursued. We experimented with HTML title, H1 element and other elements emphasizing text. Our index contained title and H1, emphasized elements, full and partial content. The best results with the WebCLEF 2005 topics were achieved with a strong weight on the title-element and a very small weight on emphasized text leading to a marginal improvement over the best post submission runs for the mixed-monolingual task at Web-CLEF 2005. For the WebCLEF 2006 topics, improved results were achieved for manually generated topics. The best performance for manual topics for WebCLEF 2006 was achieved with a strong weight on both HTML title as well as H1 elements, and a decreased weight for the other elements. Blind relevance feedback could not yet improve the results.