WordNet: a lexical database for English
Communications of the ACM
Subject-dependent co-occurrence and word sense disambiguation
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Co-occurrence vectors from corpora vs. distance vectors from dictionaries
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Unsupervised learning of generalized names
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
HLT '91 Proceedings of the workshop on Speech and Natural Language
Exploiting strong syntactic heuristics and co-training to learn semantic lexicons
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Experiments with geographic knowledge for information extraction
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Semi-supervised learning of geographical gazetteers from the internet
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
A bootstrapping approach for geographic named entity annotation
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Hi-index | 0.00 |
One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for applications such as information extraction and question answering, as well as standard named entities such as PERSON and ORGANIZATION. The experiments show the usefulness of the suggested heuristics and the learning curve evaluated at each bootstrapping loop. When our approach was applied to a newspaper corpus, it could achieve 87 F1 value, which is quite promising for the fine-grained named entity recognition task.