Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

Authors:
Yang Liu;E. Shriberg;A. Stolcke;D. Hillard;M. Ostendorf;M. Harper
Affiliations:
Dept. of Comput. Sci., Univ. of Texas, Richardson, TX;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 26

A new method for eliciting three speaking styles in the laboratory

Speech Communication
Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Speech Communication
Combining lexical, syntactic and prosodic cues for improved online dialog act tagging

Computer Speech and Language
Combining multiple information layers for the automatic generation of indicative meeting abstracts

ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
Using integer linear programming for detecting speech disfluencies

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Formatting time-aligned ASR transcripts for readability

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic comma insertion for Japanese text generation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Integration of statistical models for dictation of document translations in a machine-aided human translation task

IEEE Transactions on Audio, Speech, and Language Processing
Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-domain speech disfluency detection

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation
Question detection in spoken conversations using textual conversations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Detection of agreement and disagreement in broadcast conversations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Joint reranking of parsing and word recognition with automatic segmentation

Computer Speech and Language
Design, creation, and analysis of Czech corpora for structural metadata extraction from speech

Language Resources and Evaluation
Disfluencies and the perspective of prosodic fluency

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
A monotonic statistical machine translation approach to speaking style transformation

Computer Speech and Language
Summarizing speech by contextual reinforcement of important passages

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
A readability evaluation of real-time crowd captions in the classroom

Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Automatic assessment of expressive oral reading

Speech Communication
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing
Speech for Content Creation

International Journal of Mobile Human Computer Interaction
Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, as well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation