Managing information disparity in multilingual document collections

  • Authors:
  • Kevin Duh;Ching-Man Au Yeung;Tomoharu Iwata;Masaaki Nagata

  • Affiliations:
  • NTT Communication Science Laboratories, Japan;NTT Communication Science Laboratories, Huawei, Hong Kong;NTT Communication Science Laboratories, Japan;NTT Communication Science Laboratories, Japan

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios.