Recognizing disfluencies in conversational speech

Authors:
M. Lease;M. Johnson;E. Charniak
Affiliations:
Dept. of Comput. Sci., Brown Univ., Providence, RI;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 7

Natural language processing for information retrieval: the time is ripe (again)

Proceedings of the ACM first Ph.D. workshop in CIKM
Hybrid Multi-step Disfluency Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
From prepared speech to spontaneous speech recognition system: a comparative study applied to French language

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Automatic indexing of speech segments with spontaneity levels on large audio database

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Probabilistic dialogue models with prior domain knowledge

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IPs are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation