Extractive speech summarization using shallow rhetorical structure modeling

Authors:
Justin Jian Zhang;Ricky Ho Yin Chan;Pascale Fung
Affiliations:
Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 3

Communication and prosody: functional aspects of prosody

Speech Communication - Dialogue and prosody
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
A prosodic analysis of discourse segments in direction-giving monologues

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization

ACM Transactions on Speech and Language Processing (TSLP)
Summarizing speech without text using hidden Markov models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Speech summarization without lexical features for Mandarin broadcast news

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Extractive chinese spoken document summarization using probabilistic ranking models

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Summarizing speech by contextual reinforcement of important passages

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Self reinforcement for important passage retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Inferring social nature of conversations from words: Experiments on a corpus of everyday telephone conversations

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an extractive summarization approach with a novel shallow rhetorical structure learning framework for speech summarization. One of the most under-utilized features in extractive summarization is hierarchical structure information-semantically cohesive units that are hidden in spoken documents. We first present empirical evidence that rhetorical structure is the underlying semantic information, which is rendered in linguistic and acoustic/prosodic forms in lecture speech. A segmental summarization method, where the document is partitioned into rhetorical units by K-means clustering, is first proposed to test this hypothesis. We show that this system produces summaries at 67.36% ROUGE-L F-measure, a 4.29% absolute increase in performance compared with that of the baseline system. We then propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode the underlying hierarchical rhetorical structure in speech. Tenfold cross validation experiments are carried out on conference speeches. We show that system based on RSHMMs gives a 71.31% ROUGE-L F-measure, a 8.24% absolute increase in lecture speech summarization performance compared with the baseline system without using RSHMM. Our method equally outperforms the baseline with a conventional discourse feature. We also present a thorough investigation of the relative contribution of different features and show that, for lecture speech, speaker-normalized acoustic features give the most contribution at 68.5% ROUGE-L F-measure, compared to 62.9% ROUGE-L F-measure for linguistic features, and 59.2% ROUGE-L F-measure for un-normalized acoustic features. This shows that the individual speaking style of each speaker is highly relevant to the summarization.