Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew

Authors:
Moshe Levinger;Alon Itai;Uzzi Ornan
Affiliations:
Haifa Research Laboratory;Technion;Technion
Venue:
Computational Linguistics
Year:
1995

Citing 8
Cited 11

Grammatical category disambiguation by statistical optimization

Computational Linguistics
Self-organized language modeling for speech recognition

Readings in speech recognition
Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Morphological Disambiguation for Hebrew Search Systems

NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Hebrew Computational Linguistics: Past and Future

Artificial Intelligence Review
Statistical morphological disambiguation for agglutinative languages

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An HMM approach to vowel restoration in Arabic and Hebrew

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A morphological, syntactic and semantic search engine for Hebrew texts

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
An unsupervised morpheme-based HMM for hebrew morphological disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-based morphological disambiguation with random fields

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning morphological disambiguation rules for Turkish

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
TAGARAB: a fast, accurate Arabic name recognizer using high-precision morphological analysis

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Choosing an optimal architecture for segmentation and POS-tagging of modern Hebrew

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguation in Hebrew. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different word forms, some of which are ambiguous and some of which are not. Thus, the disambiguation of a given word can be achieved using other word forms of the same lexical entry. Even though it was originally devised and implemented for dealing with the morphological ambiguity problem in Hebrew, the basic idea can be extended and used to handle similar problems in other languages with rich morphology.