Can you tag the modal? You should

Authors:
Yael Netzer;Meni Adler;David Gabay;Michael Elhadad
Affiliations:
Ben Gurion University of the Negev, Israel;Ben Gurion University of the Negev, Israel;Ben Gurion University of the Negev, Israel;Ben Gurion University of the Negev, Israel
Venue:
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Year:
2007

Citing 0
Cited 2

Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computational linguistics methods are typically first developed and tested in English. When applied to other languages, assumptions from English data are often applied to the target language. One of the most common such assumptions is that a "standard" part-of-speech (POS) tagset can be used across languages with only slight variations. We discuss in this paper a specific issue related to the definition of a POS tagset for Modern Hebrew, as an example to clarify the method through which such variations can be defined. It is widely assumed that Hebrew has no syntactic category of modals. There is, however, an identified class of words which are modal-like in their semantics, and can be characterized through distinct syntactic and morphologic criteria. We have found wide disagreement among traditional dictionaries on the POS tag attributed to such words. We describe three main approaches when deciding how to tag such words in Hebrew. We illustrate the impact of selecting each of these approaches on agreement among human taggers, and on the accuracy of automatic POS taggers induced for each method. We finally recommend the use of a "modal" tag in Hebrew and provide detailed guidelines for this tag. Our overall conclusion is that tagset definition is a complex task which deserves appropriate methodology.