Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

Authors:
V. K. Rangarajan Sridhar;S. Bangalore;S. S. Narayanan
Affiliations:
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 9

Detecting pitch accents at the word, syllable and vowel level

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Semi-supervised learning for automatic prosodic event detection using co-training algorithm

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
N-best rescoring based on pitch-accent patterns

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Analysis of inconsistencies in cross-lingual automatic ToBI tonal accent labeling

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Cross-lingual English Spanish tonal accent labeling using decision trees and neural networks

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Automatic prosodic event detection using a novel labeling and selection method in co-training

Speech Communication
Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system

Speech Communication
Enriching machine-mediated speech-to-speech translation using contextual information

Computer Speech and Language
A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic-prosodic feature representation that is similar to linear parameterizations of the prosodic contour. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. The proposed maximum entropy acoustic-syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The reported results are significantly better than previously reported results and demonstrate the strength of maximum entropy model in jointly modeling simple lexical, syntactic, and acoustic features for automatic prosody labeling.