JHU ad hoc experiments at CLEF 2008

Authors:
Paul McNamee
Affiliations:
JHU, Human Language Technology Center of Excellence
Venue:
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Year:
2008

Citing 6
Cited 1

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Single n-gram stemming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Translating pieces of words

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
s-grams: Defining generalized n-grams for information retrieval

Information Processing and Management: an International Journal
Don't have a stemmer?: be un+concern+ed

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

CLEF 2008: ad hoc track overview

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

Quantified Score

Hi-index	0.00

Visualization

Abstract

For CLEF 2008 JHU conducted monolingual and bilingual experiments in the ad hoc TEL and Persian tasks. Additionally we performed several post hoc experiments using previous CLEF ad hoc tests sets in 13 languages. In all three tasks we explored alternative methods of tokenizing documents including plain words, stemmed words, automatically induced segments, a single selected n-gram from each word, and all n-grams from words (i.e., traditional character n-grams). Character n-grams demonstrated consistent gains over ordinary words in each of these three diverse sets of experiments. Using mean average precision, relative gains of of 50-200% on the TEL task, 5% on the Persian task, and 18% averaged over 13 languages from past CLEF evaluations, were observed.