Cross-language information retrieval with latent topic models trained on a comparable corpus

Authors:
Ivan Vulić;Wim De Smet;Marie-Francine Moens
Affiliations:
Department of Computer Science, K.U. Leuven, Belgium;Department of Computer Science, K.U. Leuven, Belgium;Department of Computer Science, K.U. Leuven, Belgium
Venue:
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Year:
2011

Citing 22
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval

Information Retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval using PARAFAC2

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining multilingual topics from wikipedia

Proceedings of the 18th international conference on World wide web
Cross-language linking of news stories on the web using interlingual topic modelling

Proceedings of the 2nd ACM workshop on Social web search and mining
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Multilingual topic models for unaligned text

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Knowledge transfer for cross domain learning to rank

Information Retrieval
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Cross-lingual keyword recommendation using latent topics

Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems
Translingual document representations from discriminative projections

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Extracting multilingual topics from unaligned comparable corpora

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Combining wikipedia-based concept models for cross-language retrieval

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval

Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The probabilistic interlingual representation is incorporated in a statistical language model for information retrieval. Experiments performed on the English and Dutch test datasets of the CLEF 2001-2003 CLIR campaigns show the competitive performance of our approach compared to cross-language retrieval methods that rely on pre-existing translation dictionaries that are hand-built or constructed based on parallel corpora.