Pause concepts for audio segmentation at different semantic levels

Authors:
Silvia Pfeiffer
Affiliations:
CSIRO Mathematical and Information Sciences, North Ryde NSW, Australia
Venue:
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Year:
2001

Citing 11
Cited 3

Fundamentals of speech recognition

Fundamentals of speech recognition
Heuristic approach for generic audio data segmentation and annotation

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
A Tutorial on MPEG/Audio Compression

IEEE MultiMedia
Audio Structuring and Personalized Retrieval Using Ontologies

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Sound analysis using MPEG compressed audio

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques

Multimedia Tools and Applications
Speech segmentation without speech recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Structural and semantic modeling of audio for content-based querying and browsing

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents work on the determination of temporal audio segmentations at different semantic levels. The segmentation algorithm draws upon the calculation of relative silences or pauses. A perceptual loudness measure is the only feature employed. An adaptive threshold is used for classification into pause and non-pause. The segmentation algorithm that determines perceptually relevant pause intervals for different semantic levels incorporates a minimum duration and a maximum interruption constraint. The influence of the different parameters on the segmentation is examined in experiments and presented in this paper. A new approach for evaluating segmentation accuracies is required. It is shown that the simple perceptual pause concept has a very high relevance when segmenting audio at different semantic levels.