Multilingual Information Retrieval Based on Document Alignment Techniques

Authors:
Martin Braschler;Peter Schäuble
Affiliations:
-;-
Venue:
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Year:
1998

Citing 6
Cited 7

Relevance feedback and other query modification techniques

Information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases

Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I

Meta-data Extraction and Query Translation. Treatment of Semantic Heterogeneity

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Multilingual pseudo-relevance feedback: performance study of assisting languages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating a Persian-English comparable corpus

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Mining large-scale comparable corpora from Chinese-English news collections

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Using ontological chain to resolve the translation ambiguity of cross-language information retrieval

TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
Mining English-Chinese Named Entity Pairs from Comparable Corpora

ACM Transactions on Asian Language Information Processing (TALIP)
Topic based creation of a persian-english comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.