Multi-document summarization using off the shelf compression software

Authors:
Amardeep Grewal;Timothy Allison;Stanko Dimitrov;Dragomir Radev
Affiliations:
University of Michigan;University of Michigan;University of Michigan;University of Michigan
Venue:
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Year:
2003

Citing 4
Cited 3

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4

Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Online conversation mining for author characterization and topic identification

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
On the generation of rich content metadata from social media

Proceedings of the 3rd international workshop on Search and mining user-generated contents

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study examines the usefulness of common off the shelf compression software such as gzip in enhancing already existing summaries and producing summaries from scratch. Since the gzip algorithm works by removing repetitive data from a file in order to compress it, we should be able to determine which sentences in a summary contain the least repetitive data by judging the gzipped size of the summary with the sentence compared to the gzipped size of the summary without the sentence. By picking the sentence that increased the size of the summary the most, we hypothesized that the summary will gain the sentence with the most new information. This hypothesis was found to be true in many cases and to varying degrees in this study.