Integrating clustering and multi-document summarization to improve document understanding

  • Authors:
  • Dingding Wang;Shenghuo Zhu;Tao Li;Yun Chi;Yihong Gong

  • Affiliations:
  • Florida International University, Miami, FL, USA;NEC Laboratories America, Cupertino, CA, USA;Florida International University, Miami, FL, USA;NEC Laboratories America, Cupertino, CA, USA;NEC Laboratories America, Cupertino, CA, USA

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document understanding techniques such as document clustering and multi-document summarization have been receiving much attention in recent years. Current document clustering methods usually represent documents as a term-document matrix and perform clustering algorithms on it. Although these clustering methods can group the documents satisfactorily, it is still hard for people to capture the meanings of the documents since there is no satisfactory interpretation for each document cluster. In this paper, we propose a new language model to simultaneously cluster and summarize the documents. By utilizing the mutual influence of the document clustering and summarization, our method makes (1) a better document clustering method with more meaningful interpretation and (2) a better document summarization method taking the document context information into consideration.