Conditional random fields for word hyphenation

Authors:
Nikolaos Trogkanis;Charles Elkan
Affiliations:
University of California, San Diego, La Jolla, California;University of California, San Diego, La Jolla, California
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 7
Cited 0

NETtalk: a parallel network that learns to read aloud

Neurocomputing: foundations of research
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Finite state methods for hyphenation

Natural Language Engineering
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
AUTO-MA-TIC WORD DI-VI-SION

ACM SIGDOC Asterisk Journal of Computer Documentation
The Unreasonable Effectiveness of Data

IEEE Intelligent Systems
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. Experiments show that both the Knuth/Liang method and a leading current commercial alternative have error rates several times higher for both languages.