Automatic prosodic event detection using a novel labeling and selection method in co-training

Authors:
Je Hun Jeon;Yang Liu
Affiliations:
Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA;Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
Venue:
Speech Communication
Year:
2012

Citing 20
Cited 0

Pitch accent in context: predicting intonational prominence from text

Artificial Intelligence - Special volume on natural language processing
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using conditional random fields to predict pitch accents in conversational speech

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Unsupervised and semi-supervised learning of tone and pitch accent

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Automatic prosodic events detection using syllable-based acoustic and syntactic features

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
On the syllabification of phonemes

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semi-supervised learning for automatic prosodic event detection using co-training algorithm

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
N-best rescoring based on pitch-accent patterns

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Prosody dependent speech recognition on radio news corpus of American English

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse-level representation of prosodic events such as pitch accent, intonational phrase boundaries, and break indices. Since co-training works on the condition that the views are compatible and uncorrelated, and real data often do not satisfy these conditions, we propose a method to label and select examples in co-training. In our experiments on the Boston University radio news corpus, when using only a small amount of the labeled data as the initial training set, our proposed labeling method can effectively use unlabeled data to improve performance and finally reach performance close to the results of the supervised method using more labeled data. We perform a thorough analysis of various factors impacting the learning curves, including labeling error rate and informativeness of added examples, performance of the individual classifiers and their difference, and the initial and added data size.