The diversity-based approach to open-domain text summarization

  • Authors:
  • Tadashi Nomoto;Yuji Matsumoto

  • Affiliations:
  • National Institute of Japanese Literature, 1-16-10 Yutaka Shinagawa, Tokyo 142-8585, Japan;Nara Institute of Science and Technology, 8916-5 Takayama Ikoma, Nara 630-0129, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper introduces a novel approach to unsupervised text summarization, which in principle should work for any domain or genre. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature. We propose, in addition, what we call the information-centric approach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization. To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the test data known as BMIR-J2. The results demonstrate a clear superiority of the diversity-based approach to a non-diversity-based approach.The paper also addresses the question of how closely the diversity approach models human judgments on summarization. We have created a relatively large volume of data annotated for relevance to summarization by human subjects. We have trained a decision tree-based summarizer using the data, and examined how the diversity method compares with the supervised method in performance when tested on the data. It was found that the diversity approach performs as well as and in some cases superior to the supervised method.