Topic decomposition and summarization

  • Authors:
  • Wei Chen;Can Wang;Chun Chen;Lijun Zhang;Jiajun Bu

  • Affiliations:
  • College of Computer Science, Zhejiang University, Hangzhou, China;College of Computer Science, Zhejiang University, Hangzhou, China;College of Computer Science, Zhejiang University, Hangzhou, China;College of Computer Science, Zhejiang University, Hangzhou, China;College of Computer Science, Zhejiang University, Hangzhou, China

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study topic decomposition and summarization for a temporal-sequenced text corpus of a specific topic. The task is to discover different topic aspects (i.e., sub-topics) and incidents related to each sub-topic of the text corpus, and generate summaries for them. We present a solution with the following steps: (1) deriving sub-topics by applying Non-negative Matrix Factorization (NMF) to terms-by-sentences matrix of the text corpus; (2) detecting incidents of each sub-topic and generating summaries for both sub-topic and its incidents by examining the constitution of its encoding vector generated by NMF; (3) ranking each sentences based on the encoding matrix and selecting top ranked sentences of each sub-topic as the text corpus' summary. Experimental results show that the proposed topic decomposition method can effectively detect various aspects of original documents. Besides, the topic summarization method achieves better results than some well-studied methods.