Extracting parallel fragments from comparable corpora for data-to-text generation

Authors:
Anja Belz;Eric Kow
Affiliations:
University of Brighton, Brighton, UK;University of Brighton, Brighton, UK
Venue:
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Year:
2010

Citing 15
Cited 1

Speaking the Users' Languages

IEEE Intelligent Systems
Preserving Ambiguities in Generation via Automata Intersection

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Two-level, many-paths generation

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Improving IBM word-alignment model 1

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Adaptive, intelligent presentation of information for the museum visitor in PEACH

User Modeling and User-Adapted Interaction
Automatic generation of textual summaries from neonatal intensive care data

Artificial Intelligence
Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models

Natural Language Engineering
System building cost vs. output quality in data-to-text generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
The TUNA challenge 2008: overview and evaluation results

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference

Paraphrase fragment extraction from monolingual comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the idea of automatically extracting parallel resources for data-to-text generation from comparable corpora obtained from the Web. We describe our comparable corpus of data and texts relating to British hills and the techniques for extracting paired input/output fragments we have developed so far.