Cross-language linking of news stories on the web using interlingual topic modelling

Authors:
Wim De Smet;Marie-Francine Moens
Affiliations:
Katholieke Universiteit Leuven, Leuven, Belgium;Katholieke Universiteit Leuven, Leuven, Belgium
Venue:
Proceedings of the 2nd ACM workshop on Social web search and mining
Year:
2009

Citing 18
Cited 11

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
An Introduction to Variational Methods for Graphical Models

Machine Learning
Learning Approaches for Detecting and Tracking News Events

IEEE Intelligent Systems
Explorations within topic tracking and detection

Topic detection and tracking
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval

Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Geographical information recognition and visualization in texts written in various languages

Proceedings of the 2004 ACM symposium on Applied computing
Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Language-specific models in multilingual topic tracking

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model for retrospective news event detection

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
BiTAM: bilingual topic AdMixture models for word alignment

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
New event detection based on indexing-tree and named entity

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval using PARAFAC2

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Bilingual topic aspect classification with a few training examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining multilingual topics from wikipedia

Proceedings of the 18th international conference on World wide web
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

A cocktail approach to the VideoCLEF'09 linking task

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Identifying word translations from comparable corpora using latent topic models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Knowledge transfer across multilingual corpora via latent topics

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
Combining wikipedia-based concept models for cross-language retrieval

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Cross-language information retrieval with latent topic models trained on a comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Detecting highly confident word translations from comparable corpora without any prior knowledge

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A unified framework for monolingual and cross-lingual relevance modeling based on probabilistic topic models

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval
Are words enough?: a study on text-based representations and retrieval models for linking pins to online shops

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News.