C4.5: programs for machine learning
C4.5: programs for machine learning
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Getting Useful Gender Statistics from English Text
Getting Useful Gender Statistics from English Text
Automatic verb classification based on statistical distributions of argument structure
Computational Linguistics
Learning to identify animate references
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Animacy encoding in English: why and how
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
Hi-index | 0.00 |
This paper presents results from experiments in automatic classification of animacy for Norwegian nouns using decision-tree classifiers. The method makes use of relative frequency measures for linguistically motivated morphosyntactic features extracted from an automatically annotated corpus of Norwegian. The classifiers are evaluated using leave-one-out training and testing and the initial results are promising (approaching 90% accuracy) for high frequency nouns, however deteriorate gradually as lower frequency nouns are classified. Experiments attempting to empirically locate a frequency threshold for the classification method indicate that a subset of the chosen morphosyntactic features exhibit a notable resilience to data sparseness. Results will be presented which show that the classification accuracy obtained for high frequency nouns (with absolute frequencies 1000) can be maintained for nouns with considerably lower frequencies (~50) by backing off to a smaller set of features at classification.