Combining wikipedia-based concept models for cross-language retrieval

Authors:
Benjamin Roth;Dietrich Klakow
Affiliations:
Spoken Language Systems, Saarland University, Germany;Spoken Language Systems, Saarland University, Germany
Venue:
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Year:
2010

Citing 16
Cited 4

Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
CLEF 2000 - Overview of Results

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Latent dirichlet allocation

The Journal of Machine Learning Research
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Mining multilingual topics from wikipedia

Proceedings of the 18th international conference on World wide web
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
The ESA retrieval model revisited

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Cross-language linking of news stories on the web using interlingual topic modelling

Proceedings of the 2nd ACM workshop on Social web search and mining
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
An experimental comparison of explicit semantic analysis implementations for cross-language retrieval

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems

Scaling up high-value retrieval to medium-volume data

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Cross-language information retrieval with latent topic models trained on a comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR by simply normalizing the training data. Furthermore, we show that LDA and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC corpus und a CLEF dataset.