Using conditional random fields for sentence boundary detection in speech

Authors:
Yang Liu;Andreas Stolcke;Elizabeth Shriberg;Mary Harper
Affiliations:
ICSI, Berkeley;SRI and ICSI;SRI and ICSI;Purdue University
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 5
Cited 11

Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Structural event detection for rich transcription of speech

Structural event detection for rich transcription of speech
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Automatic call section segmentation for contact-center calls

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
From prepared speech to spontaneous speech recognition system: a comparative study applied to French language

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Reconstructing false start errors in spontaneous speech text

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Multi-view semi-supervised learning for dialog act segmentation of speech

IEEE Transactions on Audio, Speech, and Language Processing
Appropriately handled prosodic breaks help PCFG parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Contextual information improves OOV detection in speech

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The CALO meeting assistant system

IEEE Transactions on Audio, Speech, and Language Processing
Better punctuation prediction with dynamic conditional random fields

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Lessons learned in part-of-speech tagging of conversational speech

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Contextual maximum entropy model for edit disfluency detection of spontaneous speech

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Using prosody for automatic sentence segmentation of multi-party meetings

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries. In this paper, we evaluate the use of a conditional random field (CRF) for this task and relate results with this model to our prior work. We evaluate across two corpora (conversational telephone speech and broadcast news speech) on both human transcriptions and speech recognition output. In general, our CRF model yields a lower error rate than the HMM and Maxent models on the NIST sentence boundary detection task in speech, although it is interesting to note that the best results are achieved by three-way voting among the classifiers. This probably occurs because each model has different strengths and weaknesses for modeling the knowledge sources.