Heuristics: intelligent search strategies for computer problem solving
Heuristics: intelligent search strategies for computer problem solving
A computational grammar of discourse-neutral prosodic phrasing in English
Computational Linguistics
C4.5: programs for machine learning
C4.5: programs for machine learning
Dynamic Programming Treatment of the Travelling Salesman Problem
Journal of the ACM (JACM)
Training Invariant Support Vector Machines
Machine Learning
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
A hierarchical stochastic model for automatic prediction of prosodic boundary location
Computational Linguistics
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Learning to predict pitch accents and prosodic boundaries in Dutch
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Evaluation of automatic break insertion for an agglutinative and inflected language
Speech Communication
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Determining the position of breaks in a sentence is a key task for a text-to-speech system. A synthesized sentence containing incorrect breaks at best requires increased listening effort, and at worst, may have lower intelligibility and different semantics from a correctly phrased sentence. In addition, the position of breaks must be known before other components of the sentence's prosodic structure can be determined. We consider here some methods for phrase break prediction in which the whole sentence is analysed, in contrast to most previous work which has focused on analysing an area around an individual juncture. One of the main features we use is part-of-speech tags. First, we report an algorithm that reduces the number of tags in the tagset whilst improving break prediction accuracy. We then describe three approaches to break prediction: by analogy, in which we find the best-matching sentence in our training data to the unseen sentence; by phrase modelling, in which we build stochastic models of phrases and use these, together with a ''phrase grammar'', to segment the unseen sentence; and finally, using features derived from a syntactic parse of the sentence. All techniques achieve well above our baseline performance, which used punctuation symbols to determine break positions, and performance increased with each successive technique. Our best result, obtained on the MARSEC corpus and using a combination of parse tree derived features and a local feature, gave an F score of 81.6%, which we believe to be the highest published on this dataset.