Speech segmentation without speech recognition

Authors:
Dong Wang;Lie Lu;Hong-Jiang Zhang
Affiliations:
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China;Inst. for Human-Comput. Commun., Technische Univ. Munchen, Germany;Inst. for Human-Comput. Commun., Technische Univ. Munchen, Germany
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Year:
2003

Citing 6
Cited 3

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Automatic generation of concise summaries of spoken dialogues in unrestricted domains

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Pause concepts for audio segmentation at different semantic levels

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Prosody in Speech Understanding Systems

Prosody in Speech Understanding Systems
A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia

Review: Speaker segmentation and clustering

Signal Processing
Phoneme and tonal accent recognition for Thai speech

Expert Systems with Applications: An International Journal
A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we presented a semantic speech segmentation approach, in particular sentence segmentation, without speech recognition. In order to get phoneme level information without word recognition information, a novel vowel/consonant/pause (V/C/P) classification is proposed. An adaptive pause detection method is also presented to adapt to various background and environment. Three feature sets, which include pause, rate of speech and prosody, are used to discriminate the sentence boundary. Experiments on broadcasting news indicate that the performance of proposed algorithm is satisfying.