Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Co-occurrence patterns among collocations: a tool for corpus-based lexical knowledge acquisition
Computational Linguistics
Reusing an ontology to generate numeral classifiers
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Classifiers in Japanese-to-English machine translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Corpus-based generation of numeral classifier using phrase alignment
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Infrastructure for standardization of Asian language resources
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Web and corpus methods for Malay count classifier prediction
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Hi-index | 0.00 |
This paper presents an algorithm for selecting an appropriate classifier word for a noun. In Thai language, it frequently happens that there is fluctuation in the choice of classifier for a given concrete noun, both from the point of view of the whole speech community and individual speakers. Basically, there is no exact rule for classifier selection. As far as we can do in the rule-based approach is to give a default rule to pick up a corresponding classifier of each noun. Registration of classifier for each noun is limited to the type of unit classifier because other types are open due to the meaning of representation. We propose a corpus-based method (Biber, 1993; Nagao, 1993; Smadja, 1993) which generates Noun Classifier Associations (NCA) to overcome the problems in classifier assignment and semantic construction of noun phrase. The NCA is created statistically from a large corpus and recomposed under concept hierarchy constraints and frequency of occurrences.