Temporal Compression Of Speech: An Evaluation

Authors:
S. Tucker;S. Whittaker
Affiliations:
Dept. of Inf. Studies, Univ. of Sheffield, Sheffield;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 4

Time-Compressing Speech: ASR Transcripts Are an Effective Way to Support Gist Extraction

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Summarizing multiple spoken documents: finding evidence from untranscribed audio

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient browsing of speech recordings is problematic. The linear nature of speech, coupled with the lack of abstraction that the medium affords, means that listeners have to listen to long segments of a recording to locate points of interest. We explore temporal compression algorithms that attempt to reduce the amount of time users require to listen to speech recordings, while retaining the important content. This paper implements two main approaches to temporal compression: artificial speech rate alteration (speed-up) and unimportant segment removal (excision). We evaluate the effectiveness of these approaches by having listeners rate comprehension and listening effort for different types of temporal compression. For different compression levels, we compare performance of various implementations of speed-up and excision as well as techniques based on semantic features and acoustic features. Our results indicate that listeners prefer low compression levels, excision over speed-up, and algorithms based on semantic rather than acoustic features. Finally, listeners were negative about hybrid algorithms that used speed-up to indicate missing regions in an excised recording.