An unsupervised morpheme-based HMM for hebrew morphological disambiguation

Authors:
Meni Adler;Michael Elhadad
Affiliations:
Ben Gurion University of the Negev, Beer Sheva, Israel;Ben Gurion University of the Negev, Beer Sheva, Israel
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 13
Cited 15

Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Building probabilistic models for natural language

Building probabilistic models for natural language
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Tagging English text with a probabilistic model

Computational Linguistics
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A finite-state morphological grammar of Hebrew

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Choosing an optimal architecture for segmentation and POS-tagging of modern Hebrew

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
POS tagging of dialectal Arabic: a minimally supervised approach

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

Noun phrase chunking in Hebrew: influence of lexical and morphological features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Part-of-speech tagging of modern hebrew text

Natural Language Engineering
Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Stat-XFER: a general search-based syntax-driven framework for machine translation

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Identification of transliterated foreign words in Hebrew script

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hierarchical joint learning: improving joint parsing and named entity recognition with non-jointly labeled data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A new approach to lexical disambiguation of Arabic text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Modeling syntactic context improves morphological segmentation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Universal morphological analysis using structured nearest neighbor prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Statistical thesaurus construction for a morphologically rich language

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Joint evaluation of morphological segmentation and syntactic parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A rule-based approach to unknown word recognition in Arabic

SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
Part of speech tagging for arabic

Natural Language Engineering
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system

Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. We present an un-supervised stochastic model - the only resource we use is a morphological analyzer-which deals with the data sparseness problem caused by the affixational morphology of the Hebrew language.We present a text encoding method for languages with affixational morphology in which the knowledge of word formation rules (which are quite restricted in Hebrew) helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with affix morphology.