On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic
Computational Linguistics - Special issue on finite-state methods in NLP
Statistical morphological disambiguation for agglutinative languages
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Building a shallow Arabic Morphological Analyzer in one day
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Logarithmic opinion pools for conditional random fields
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic tagging of Arabic text: from raw text to base phrase chunks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
From Czech morphology through partial parsing to disambiguation
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Compiling Comp Ling: practical weighted dynamic programming and the Dyna language
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Computational Linguistics
HunPos: an open source trigram tagger
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
The best of two worlds: cooperation of statistical and rule-based taggers for Czech
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Context-based Arabic morphological analysis for machine translation
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Arabic diacritization through full morphological tagging
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Web-based frequency dictionaries for medium density languages
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
A new approach to lexical disambiguation of Arabic text
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A probabilistic morphological analyzer for Syriac
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A discriminative model for joint morphological disambiguation and dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Discovering morphological paradigms from plain text using a Dirichlet process mixture model
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feature-rich part-of-speech tagging for morphologically complex languages: application to Bulgarian
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system
Computational Linguistics
Hi-index | 0.00 |
Finite-state approaches have been highly successful at describing the morphological processes of many languages. Such approaches have largely focused on modeling the phone- or character-level processes that generate candidate lexical types, rather than tokens in context. For the full analysis of words in context, disambiguation is also required (Hakkani-Tür et al., 2000; Hajič et al., 2001). In this paper, we apply a novel source-channel model to the problem of morphological disambiguation (segmentation into morphemes, lemmatization, and POS tagging) for concatenative, templatic, and inflectional languages. The channel model exploits an existing morphological dictionary, constraining each word's analysis to be linguistically valid. The source model is a factored, conditionally-estimated random field (Lafferty et al., 2001) that learns to disambiguate the full sentence by modeling local contexts. Compared with baseline state-of-the-art methods, our method achieves statistically significant error rate reductions on Korean, Arabic, and Czech, for various training set sizes and accuracy measures.