Improving multilingual summarization: using redundancy in the input to correct MT errors

Authors:
Advaith Siddharthan;Kathleen McKeown
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 6
Cited 3

Word association norms, mutual information, and lexicography

Computational Linguistics
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Information fusion for multidocument summarization: paraphrasing and generation

Information fusion for multidocument summarization: paraphrasing and generation
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998

Cross-language document summarization based on machine translation quality prediction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using bilingual information for cross-language document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Summarizing the differences in multilingual news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we use the information redundancy in multilingual input to correct errors in machine translation and thus improve the quality of multilingual summaries. We consider the case of multi-document summarization, where the input documents are in Arabic, and the output summary is in English. Typically, information that makes it to a summary appears in many different lexical-syntactic forms in the input documents. Further, the use of multiple machine translation systems provides yet more redundancy, yielding different ways to realize that information in English. We demonstrate how errors in the machine translations of the input Arabic documents can be corrected by identifying and generating from such redundancy, focusing on noun phrases.