Using parallel corpora for multilingual (multi-document) summarisation evaluation

Authors:
Marco Turchi;Josef Steinberger;Mijail Kabadjov;Ralf Steinberger
Affiliations:
European Commission-Joint Research Centre, IPSC-GlobSec, Ispra, VA, Italy;European Commission-Joint Research Centre, IPSC-GlobSec, Ispra, VA, Italy;European Commission-Joint Research Centre, IPSC-GlobSec, Ispra, VA, Italy;European Commission-Joint Research Centre, IPSC-GlobSec, Ispra, VA, Italy
Venue:
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Year:
2010

Citing 5
Cited 3

Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Update Summarization Based on Latent Semantic Analysis

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Text summarization and singular value decomposition

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems

Exploring clustering for multi-document arabic summarisation

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A survey of methods to ease the development of highly multilingual text mining applications

Language Resources and Evaluation
Machine translation for multilingual summary content evaluation

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.