Learning to predict pitch accents and prosodic boundaries in Dutch

Authors:
Erwin Marsi;Martin Reynaert;Antal van den Bosch;Walter Daelemans;Véronique Hoste
Affiliations:
Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;University of Antwerp, Antwerp, Belgium;University of Antwerp, Antwerp, Belgium
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 9
Cited 5

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Instance-Based Learning Algorithms

Machine Learning
Pitch accent in context: predicting intonational prominence from text

Artificial Intelligence - Special volume on natural language processing
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval

Information Retrieval
Modeling local context for pitch accent prediction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Combining acoustic and pragmatic features to predict recognition performance in spoken dialogue systems

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
New statistical methods for phrase break prediction

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stochastic and syntactic techniques for predicting phrase breaks

Computer Speech and Language
Evaluation of automatic break insertion for an agglutinative and inflected language

Speech Communication
Frequency matters: pitch accents and information status

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously. The parameters of both algorithms and the selection of features are optimized per task with iterative deepening, an efficient wrapper procedure that uses progressive sampling of training data. Results show a consistent significant advantage of MBL over CART, and also indicate that task combination can be done at the cost of little generalization score loss. Tests on cross-validated data and on held-out data yield F-scores of MBL on accent placement of 84 and 87, respectively, and on breaks of 88 and 91, respectively. Accent placement is shown to outperform an informed baseline rule; reliably predicting breaks other than those already indicated by intra-sentential punctuation, however, appears to be more challenging.