A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization

Authors:
Yi-Ting Chen;B. Chen;Hsin-Min Wang
Affiliations:
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Normal Univ., Taipei;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 0
Cited 5

Word Topic Models for Spoken Document Retrieval and Transcription

ACM Transactions on Asian Language Information Processing (TALIP)
A risk minimization framework for extractive speech summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Summarizing speech by contextual reinforcement of important passages

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Extractive speech summarization using evaluation metric-related training criteria

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.