Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
Predicting automatic speech recognition performance using prosodic cues
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
Computer Speech and Language
Automatic prosodic events detection using syllable-based acoustic and syntactic features
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Story segmentation of brodcast news in English, Mandarin and Arabic
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
On the syllabification of phonemes
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Speech recognition supported by prosodic information for fixed stress languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification
IEEE Transactions on Audio, Speech, and Language Processing
Prosody dependent speech recognition on radio news corpus of American English
IEEE Transactions on Audio, Speech, and Language Processing
Recent innovations in speech-to-text transcription at SRI-ICSI-UW
IEEE Transactions on Audio, Speech, and Language Processing
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount of data and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.