JHU/APL ad hoc experiments at CLEF 2006

Authors:
Paul McNamee
Affiliations:
Johns Hopkins University Applied Physics Laboratory, Laurel, MD
Venue:
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Year:
2006

Citing 4
Cited 1

Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Translating pieces of words

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring new languages with HAIRCUT at CLEF 2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Mixed monolingual homepage finding in 34 languages: the role of language script and search domain

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

JHU/APL continued its participation in multilingual retrieval at CLEF in 2006. We again applied our hallmark technique for combating language diversity and morphological complexity: character n-gram tokenization. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. Our experimental results this year agree with our previous reports that n-grams perform especially well in linguistically complex languages, notably Bulgarian and Hungarian, where monolingual improvements of 27% and 70% respectively were observed compared to space-delimited word forms. As in CLEF 2005, our bilingual submissions made use of subword translation, statistical translation of character n-grams using aligned corpora, when parallel data were available, and webbased machine translation, when no suitable data was available to us.