One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization

  • Authors:
  • Pascale Fung;Grace Ngai

  • Affiliations:
  • Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Hong Kong;Hong Kong Polytechnic University, Kowloon, Hong Kong

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents a multidocument, multilingual, theme-based summarization system based on modeling text cohesion (story flow). Conventional extractive summarization systems which pick out salient sentences to include in a summary often disregard any flow or sequence that might exist between these sentences. We argue that such inherent text cohesion exists and is (1) specific to a particular story and (2) specific to a particular language. Documents within the same story, and in the same language, share a common story flow, and this flow differs across stories, and across languages. We propose using Hidden Markov Models (HMMs) as story models. An unsupervised segmental K-means method is used to iteratively cluster multiple documents into different topics (stories) and learn the parameters of parallel Hidden Markov Story Models (HMSM), one for each story. We compare story models within and across stories and within and across languages (English and Chinese). The experimental results support our “one story, one flow” and “one language, one flow” hypotheses. We also propose a Naïve Bayes classifier for document summarization. The performance of our summarizer is superior to conventional methods that do not incorporate text cohesion information. Our HMSM method also provides a simple way to compile a single metasummary for multiple documents from individual summaries via state labeled sentences.