Named entity recognition in Wikipedia

Authors:
Dominic Balasuriya;Nicky Ringland;Joel Nothman;Tara Murphy;James R. Curran
Affiliations:
University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia
Venue:
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Year:
2009

Citing 10
Cited 4

Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Language independent NER using a maximum entropy tagger

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Unsupervised Multilingual Sentence Boundary Detection

Computational Linguistics
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Tag and Tagging to Learn: A Case Study on Wikipedia

IEEE Intelligent Systems
Comparison between tagged corpora for the named entity task

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Analysing Wikipedia and gold-standard corpora for NER training

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

Automatic gazetteer generation from wikipedia

NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Population of a knowledge base for news metadata from unstructured text and web data

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG shows that Wikipedia text may be a harder NER domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG and, when used as training data, outperforms newswire models by up to 7.7%.