Learning to predict pitch accents and prosodic boundaries in Dutch

  • Authors:
  • Erwin Marsi;Martin Reynaert;Antal van den Bosch;Walter Daelemans;Véronique Hoste

  • Affiliations:
  • Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;University of Antwerp, Antwerp, Belgium;University of Antwerp, Antwerp, Belgium

  • Venue:
  • ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously. The parameters of both algorithms and the selection of features are optimized per task with iterative deepening, an efficient wrapper procedure that uses progressive sampling of training data. Results show a consistent significant advantage of MBL over CART, and also indicate that task combination can be done at the cost of little generalization score loss. Tests on cross-validated data and on held-out data yield F-scores of MBL on accent placement of 84 and 87, respectively, and on breaks of 88 and 91, respectively. Accent placement is shown to outperform an informed baseline rule; reliably predicting breaks other than those already indicated by intra-sentential punctuation, however, appears to be more challenging.