Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Template detection for large scale search engines
Proceedings of the 2006 ACM symposium on Applied computing
Implementation and evaluation of a quality-based search engine
Proceedings of the seventeenth conference on Hypertext and hypermedia
Web retrieval experiments with the EuroGOV corpus at the university of hildesheim
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Hi-index | 0.00 |
Experiments with a multi-lingual web collection are presented. The EuroGOV corpus is the first multi-lingual web corpus for retrieval evaluation. We show how indexes based on words and n-rams are developed for different document parts. Different indexes werde based on the full document content, partial content and the title. The best results were achieved for a title only index based on words.