A comparison of methods for the automatic identification of locations in wikipedia

Authors:
Davide Buscaldi;Paolo Rosso
Affiliations:
Universidad Politécnica de Valencia, Valencia, Spain;Universidad Politécnica de Valencia, Valencia, Spain
Venue:
Proceedings of the 4th ACM workshop on Geographical information retrieval
Year:
2007

Citing 6
Cited 4

WordNet: a lexical database for English

Communications of the ACM
Language representation

Survey of the state of the art in human language technology
Spatial information retrieval and geographical ontologies an overview of the SPIRIT project

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Wikipedia XML corpus

ACM SIGIR Forum
BUAP-UPV TPIRS: a system for document indexing reduction at WebCLEF

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Assigning geographical scopes to web pages

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Geographical classification of documents using evidence from Wikipedia

Proceedings of the 6th Workshop on Geographic Information Retrieval
Generating approximate region boundaries from heterogeneous spatial information: An evolutionary approach

Information Sciences: an International Journal
Fusing Text and Frienships for Location Inference in Online Social Networks

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.