Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization

  • Authors:
  • Libin Yang;Xiaoyan Cai;Yang Zhang;Peng Shi

  • Affiliations:
  • College of Information Engineering, Northwest Agriculture and Forestry Univerisity, Xi'an Shaanxi, China;College of Information Engineering, Northwest Agriculture and Forestry Univerisity, Xi'an Shaanxi, China;College of Information Engineering, Northwest Agriculture and Forestry Univerisity, Xi'an Shaanxi, China;College of Engineering and Science, Victoria University, Melbourne, VIC 8001, Australia and Department of Computing and Mathematical Sciences, University of South Wales, Pontypridd CF37 1DL, Unite ...

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2014

Quantified Score

Hi-index 0.07

Visualization

Abstract

Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences in order to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer reasonably suitable. Special treatment for measuring sentence similarity is necessary. In this paper, we propose a ranking-based clustering framework that utilizes ranking distribution of documents and terms to help generate high quality sentence clusters. The effectiveness of the proposed framework is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004 and DUC2007 datasets.