Corpus-based cross-language information retrieval in retrieval of highly relevant documents: Research Articles

Authors:
Tuomas Talvensaari;Martti Juhola;Jorma Laurikkala;Kalervo Järvelin
Affiliations:
Department of Computer Sciences, University of Tampere, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland.;Department of Computer Sciences, University of Tampere, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland.;Department of Computer Sciences, University of Tampere, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland.;Department of Information Studies, University of Tampere, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland
Venue:
Journal of the American Society for Information Science and Technology
Year:
2007

Citing 19
Cited 4

Multiple comparison procedures

Multiple comparison procedures
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Applied multivariate techniques

Applied multivariate techniques
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
UTACLIR -: general query translation framework for several language pairs

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

Information Retrieval
Using graded relevance assessments in IR evaluation

Journal of the American Society for Information Science and Technology
Cross-Language Evaluation Forum: Objectives, Results, Achievements

Information Retrieval
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Creating and exploiting a comparable corpus in cross-language information retrieval

ACM Transactions on Information Systems (TOIS)
Dictionary-based CLIR loses highly relevant documents

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Creating and exploiting a comparable corpus in cross-language information retrieval

ACM Transactions on Information Systems (TOIS)
Using Mutual Information Technique in Cross-Language Information Retrieval

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents

Information Retrieval
Multilingual document mining and navigation using self-organizing maps

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish–Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels—liberal, regular, and stringent—were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CLIR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents. © 2007 Wiley Periodicals, Inc.