Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Clustering techniques for open relation extraction
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
A weighting scheme for open information extraction
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Hi-index | 0.00 |
The state-of-the-art in Named Entity Recognition relies on a combination of local features of the text and global knowledge to determine the types of the recognized entities. This is problematic in some cases, resulting in entities being classified as belonging to the wrong type. We show that using global information about the corpus improves the accuracy of type identification. We explore the notion of a global domain frequency that relates relation identifying terms with pairs of entity types which are used in that relation. We use this to identify entities whose types are not compatible with the terms they co-occur in the text. Our results on a large corpus of social media content allows the identification of mistyped entities with 70% accuracy.