Automatic acquisition of named entity tagged corpus from world wide web
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Analysing Wikipedia and gold-standard corpora for NER training
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Active learning with Amazon Mechanical Turk
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning multilingual named entity recognition from Wikipedia
Artificial Intelligence
Hi-index | 0.00 |
In this paper, we present a two-phase, hybrid model for generating training data for Named Entity Recognition systems. In the first phase, a trained annotator labels all named entities in a text irrespective of type. In the second phase, naïve crowdsourcing workers complete binary judgment tasks to indicate the type(s) of each entity. Decomposing the data generation task in this way results in a flexible, reusable corpus that accommodates changes to entity type taxonomies. In addition, it makes efficient use of precious trained annotator resources by leveraging highly available and cost effective crowdsourcing worker pools in a way that does not sacrifice quality.