Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling

  • Authors:
  • Zhengchen Zhang;Shuzhi Sam Ge;Hongsheng He

  • Affiliations:
  • Department of Electrical & Computer Engineering, National University of Singapore, Singapore 117576, Singapore and Social Robotics Lab, Interactive Digital Media Institute, National University of ...;Department of Electrical & Computer Engineering, National University of Singapore, Singapore 117576, Singapore and Social Robotics Lab, Interactive Digital Media Institute, National University of ...;Department of Electrical & Computer Engineering, National University of Singapore, Singapore 117576, Singapore and Social Robotics Lab, Interactive Digital Media Institute, National University of ...

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a document summarization framework for storytelling is proposed to extract essential sentences from a document by exploiting the mutual effects between terms, sentences and clusters. There are three phrases in the framework: document modeling, sentence clustering and sentence ranking. The story document is modeled by a weighted graph with vertexes that represent sentences of the document. The sentences are clustered into different groups to find the latent topics in the story. To alleviate the influence of unrelated sentences in clustering, an embedding process is employed to optimize the document model. The sentences are then ranked according to the mutual effect between terms, sentence as well as clusters, and high-ranked sentences are selected to comprise the summarization of the document. The experimental results on the Document Understanding Conference (DUC) data sets demonstrate the effectiveness of the proposed method in document summarization. The results also show that the embedding process for sentence clustering render the system more robust with respect to different cluster numbers.