Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

Authors:
Che-Kuang Lin;Lin-Shan Lee
Affiliations:
Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan;Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 16
Cited 1

A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
A maximum entropy approach to natural language processing

Computational Linguistics
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

Computational Linguistics
Edit detection and parsing for transcribed speech

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Learning the Threshold in Hierarchical Agglomerative Clustering

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
A TAG-based noisy channel model of speech repairs

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Parsing conversational speech using enhanced segmentation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A lexically-driven algorithm for disfluency detection

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing
Recognizing disfluencies in conversational speech

IEEE Transactions on Audio, Speech, and Language Processing
Edit disfluency detection and correction using a cleanup language model and an alignment model

IEEE Transactions on Audio, Speech, and Language Processing

Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detection of edit disfluencies is key to transcribing spontaneous utterances. In this paper, we present improved features and models to detect edit disfluencies and enhance transcription of spontaneous Mandarin speech using hypothesized disfluency interruption points (IPs) and edit word detection. A comprehensive set of prosodic features that takes into account the special characteristics of edit disfluencies in Mandarin is developed, and an improved model combining decision trees and maximum entropy is proposed to detect IPs. This model is further adapted to desired prosodic conditions by latent prosodic modeling, a probabilistic framework for analyzing speech prosody in terms of a set of latent prosodic states. These techniques contribute to higher recognition accuracy (by rescoring with the hypothesized IPs) and better edit word detection (using conditional random fields defined on Chinese characters) in the final transcription, as verified by experiments on a spontaneous Mandarin speech corpus.