Integrating joint n-gram features into a discriminative training framework

Authors:
Sittichai Jiampojamarn;Colin Cherry;Grzegorz Kondrak
Affiliations:
University of Alberta, Edmonton, AB, Canada;National Research Council Canada, Ottawa, ON, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 6
Cited 6

Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Joint-sequence models for grapheme-to-phoneme conversion

Speech Communication
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
DirecTL: a language-independent approach to transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Named entity transcription with pair n-gram models

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

Transliteration generation and mining with limited training resources

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Predicting word pronunciation in Japanese

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
How do you pronounce your name?: improving G2P with transliterations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Leveraging supplemental representations for sequential transduction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Transliteration experiments on Chinese and Arabic

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phonetic string transduction problems, such as letter-to-phoneme conversion and name transliteration, have recently received much attention in the NLP community. In the past few years, two methods have come to dominate as solutions to supervised string transduction: generative joint n-gram models, and discriminative sequence models. Both approaches benefit from their ability to consider large, flexible spans of source context when making transduction decisions. However, they encode this context in different ways, providing their respective models with different information. To combine the strengths of these two systems, we include joint n-gram features inside a state-of-the-art discriminative sequence model. We evaluate our approach on several letter-to-phoneme and transliteration data sets. Our results indicate an improvement in overall performance with respect to both the joint n-gram approach and traditional feature sets for discriminative models.