Cross-lingual text fragment alignment using divergence from randomness

Authors:
Sirvan Yahyaei;Marco Bonzanini;Thomas Roelleke
Affiliations:
Queen Mary, University of London, London, UK;Queen Mary, University of London, London, UK;Queen Mary, University of London, London, UK
Venue:
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Year:
2011

Citing 6
Cited 0

Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Redundant documents and search effectiveness

Proceedings of the 14th ACM international conference on Information and knowledge management
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Large scale parallel document mining for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an approach to automatically align fragments of texts of two documents in different languages. A text fragment is a list of continuous sentences and an aligned pair of fragments consists of two fragments in two documents, which are content-wise related. Cross-lingual similarity between fragments of texts is estimated based on models of divergence from randomness. A set of aligned fragments based on the similarity scores are selected to provide an alignment between sections of the two documents. Similarity measures based on divergence show strong performance in the context of cross-lingual fragment alignment in the performed experiments.