A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Monolingual Document Retrieval for European Languages
Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Language identification: a solved problem suitable for undergraduate instruction
Journal of Computing Sciences in Colleges
Evaluation of a language identification system for mono- and multilingual text documents
Proceedings of the 2006 ACM symposium on Applied computing
The quest to find the best pages on the web
Information Services and Use
Mono- and crosslingual retrieval experiments at the university of hildesheim
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Implementation and evaluation of a quality-based search engine
Proceedings of the seventeenth conference on Hypertext and hypermedia
Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus
Proceedings of the seventeenth conference on Hypertext and hypermedia
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Hi-index | 0.00 |
This paper describes web retrieval experiments with the EuroGOV corpus carried out at the University of Hildesheim. For both the multi-lingual and the mixed mono-lingual task, several indexing strategies were tested, all of them based on one mixed language index. After stopword removal, word and n-gram based indexes were developed based on the full document content, part of the content and the document title. Boosting the original topic language with a higher weight in the query and punishing the English translation led to better results for most settings. A title only run gave the best results during post submission runs for the multi-lingual task.