Heuristic methods for reducing errors of geographic named entities learned by bootstrapping

Authors:
Seungwoo Lee;Gary Geunbae Lee
Affiliations:
Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, Republic of Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, Republic of Korea
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 10
Cited 1

WordNet: a lexical database for English

Communications of the ACM
Subject-dependent co-occurrence and word sense disambiguation

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Co-occurrence vectors from corpora vs. distance vectors from dictionaries

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
Exploiting strong syntactic heuristics and co-training to learn semantic lexicons

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Experiments with geographic knowledge for information extraction

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Semi-supervised learning of geographical gazetteers from the internet

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
InfoXtract location normalization: a hybrid approach to geographic references in information extraction

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
A bootstrapping approach for geographic named entity annotation

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for applications such as information extraction and question answering, as well as standard named entities such as PERSON and ORGANIZATION. The experiments show the usefulness of the suggested heuristics and the learning curve evaluated at each bootstrapping loop. When our approach was applied to a newspaper corpus, it could achieve 87 F1 value, which is quite promising for the fine-grained named entity recognition task.