Corpus and evaluation measures for multiple document summarization with multiple sources

Authors:
Tsutomu Hirao;Takahiro Fukusima;Manabu Okumura;Chikashi Nobata;Hidetsugu Nanba
Affiliations:
NTT Communication Science Laboratories;Otemon Gakuin University;Tokyo Institute of Technology;Communication Research Laboratories;Hiroshima City University
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 11
Cited 6

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The decomposition of human-written summary sentences

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to unsupervised text summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SUMMAC: a text summarization evaluation

Natural Language Engineering
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Fast generation of abstracts from general domain text corpora by extracting relevant sentences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Evaluation challenges in large-scale document summarization

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5

Kernel-based approach for automatic evaluation of natural language generation technologies: application to automatic summarization

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
SPIDER: a system for paraphrasing in document editing and revision-applicability in machine translation pre-editing

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
A latent topic extracting method based on events in a document and its application

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Mining paraphrases from self-anchored web sentence fragments

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Aligning needles in a haystack: paraphrase acquisition across the web

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.