NETtalk: a parallel network that learns to read aloud
Neurocomputing: foundations of research
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Finite state methods for hyphenation
Natural Language Engineering
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
ACM SIGDOC Asterisk Journal of Computer Documentation
The Unreasonable Effectiveness of Data
IEEE Intelligent Systems
Confidence estimation for information extraction
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Hi-index | 0.00 |
Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. Experiments show that both the Knuth/Liang method and a leading current commercial alternative have error rates several times higher for both languages.