Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
A maximum entropy approach to natural language processing
Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Lemmatization of Polish person names
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
GYDER: maxent metonymy resolution
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Benefits of resource-based stemming in hungarian information retrieval
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Automatic free-text-tagging of online news archives
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Hi-index | 0.00 |
Identifying the lemma of a Named Entity is important for many Natural Language Processing applications like Information Retrieval. Here we introduce a novel approach for Named Entity lemmatisation which utilises the occurrence frequencies of each possible lemma. We constructed four corpora in English and Hungarian and trained machine learning methods using them to obtain simple decision rules based on the web frequencies of the lemmas. In experiments our web-based heuristic achieved an average accuracy of nearly 91%.