Cross-language linking of news stories on the web using interlingual topic modelling

  • Authors:
  • Wim De Smet;Marie-Francine Moens

  • Affiliations:
  • Katholieke Universiteit Leuven, Leuven, Belgium;Katholieke Universiteit Leuven, Leuven, Belgium

  • Venue:
  • Proceedings of the 2nd ACM workshop on Social web search and mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News.