Edit disfluency detection and correction using a cleanup language model and an alignment model

Authors:
Jui-Feng Yeh;Chung-Hsien Wu
Affiliations:
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 6

Ontology-based speech act identification in a bilingual dialog system using partial pattern trees

Journal of the American Society for Information Science and Technology
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features

ACM Transactions on Asian Language Information Processing (TALIP)
Contextual maximum entropy model for edit disfluency detection of spontaneous speech

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
A monotonic statistical machine translation approach to speaking style transformation

Computer Speech and Language
Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This investigation presents a novel approach to detecting and correcting the edit disfluency in spontaneous speech. Hypothesis testing using acoustic features is first adopted to detect potential interruption points (IPs) in the input speech. The word order of the cleanup utterance is then cleaned up based on the potential IPs using a class-based cleanup language model, the deletable region and the correction are aligned using an alignment model. Finally, log linear weighting is applied to optimize the performance. Using the acoustic features, the IP detection rate is significantly improved especially in recall rate. Based on the positions of the potential IPs, the cleanup language model and the alignment model are able to detect and correct the edit disfluency efficiently. Experimental results demonstrate that the proposed approach has achieved error rates of 0.33 and 0.21 for IP detection and edit word deletion, respectively