Cross-lingual text fragment alignment using divergence from randomness

  • Authors:
  • Sirvan Yahyaei;Marco Bonzanini;Thomas Roelleke

  • Affiliations:
  • Queen Mary, University of London, London, UK;Queen Mary, University of London, London, UK;Queen Mary, University of London, London, UK

  • Venue:
  • SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an approach to automatically align fragments of texts of two documents in different languages. A text fragment is a list of continuous sentences and an aligned pair of fragments consists of two fragments in two documents, which are content-wise related. Cross-lingual similarity between fragments of texts is estimated based on models of divergence from randomness. A set of aligned fragments based on the similarity scores are selected to provide an alignment between sections of the two documents. Similarity measures based on divergence show strong performance in the context of cross-lingual fragment alignment in the performed experiments.