Reducing the annotation effort for letter-to-phoneme conversion

Authors:
Kenneth Dwyer;Grzegorz Kondrak
Affiliations:
University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 14
Cited 3

Class-based n-gram models of natural language

Computational Linguistics
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Bagging predictors

Machine Learning
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning pronunciation dictionaries: language complexity and word selection strategies

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Active learning for logistic regression: an evaluation

Machine Learning
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Reducing labeling effort for structured prediction tasks

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
DirecTL: a language-independent approach to transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

A ranking approach to stress prediction for letter-to-phoneme conversion

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Similarity patterns in words

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Universal grapheme-to-phoneme prediction over Latin alphabets

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Letter-to-phoneme (L2P) conversion is the process of producing a correct phoneme sequence for a word, given its letters. It is often desirable to reduce the quantity of training data --- and hence human annotation --- that is needed to train an L2P classifier for a new language. In this paper, we confront the challenge of building an accurate L2P classifier with a minimal amount of training data by combining several diverse techniques: context ordering, letter clustering, active learning, and phonetic L2P alignment. Experiments on six languages show up to 75% reduction in annotation effort.