Fundamentals of speech recognition
Fundamentals of speech recognition
Heuristic approach for generic audio data segmentation and annotation
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
DISTBIC: a speaker-based segmentation for audio data indexing
Speech Communication - Special issue on accessing information in spoken audio
Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
A Tutorial on MPEG/Audio Compression
IEEE MultiMedia
Audio Structuring and Personalized Retrieval Using Ontologies
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A comparison of features for speech, music discrimination
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Sound analysis using MPEG compressed audio
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Speech/music discrimination for multimedia applications
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques
Multimedia Tools and Applications
Speech segmentation without speech recognition
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Structural and semantic modeling of audio for content-based querying and browsing
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Hi-index | 0.00 |
This paper presents work on the determination of temporal audio segmentations at different semantic levels. The segmentation algorithm draws upon the calculation of relative silences or pauses. A perceptual loudness measure is the only feature employed. An adaptive threshold is used for classification into pause and non-pause. The segmentation algorithm that determines perceptually relevant pause intervals for different semantic levels incorporates a minimum duration and a maximum interruption constraint. The influence of the different parameters on the segmentation is examined in experiments and presented in this paper. A new approach for evaluating segmentation accuracies is required. It is shown that the simple perceptual pause concept has a very high relevance when segmenting audio at different semantic levels.