N-best rescoring based on pitch-accent patterns

Authors:
Je Hun Jeon;Wen Wang;Yang Liu
Affiliations:
The University of Texas at Dallas;Speech Technology and Research Laboratory, SRI International;The University of Texas at Dallas
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 13
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Predicting automatic speech recognition performance using prosodic cues

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Combining lexical, syntactic and prosodic cues for improved online dialog act tagging

Computer Speech and Language
Automatic prosodic events detection using syllable-based acoustic and syntactic features

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Story segmentation of brodcast news in English, Mandarin and Arabic

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
On the syllabification of phonemes

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Speech recognition supported by prosodic information for fixed stress languages

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Prosody dependent speech recognition on radio news corpus of American English

IEEE Transactions on Audio, Speech, and Language Processing
Recent innovations in speech-to-text transcription at SRI-ICSI-UW

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

IEEE Transactions on Audio, Speech, and Language Processing

Automatic prosodic event detection using a novel labeling and selection method in co-training

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount of data and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.