A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization

  • Authors:
  • Shih-Hsiang Lin;Berlin Chen;Hsin-Min Wang

  • Affiliations:
  • National Taiwan Normal University;National Taiwan Normal University;Institute of Information Science, Academia Sinica

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.