Extracting Geospatial Entities from Wikipedia

Authors:
Jeremy Witmer;Jugal Kalita
Affiliations:
-;-
Venue:
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Year:
2009

Citing 0
Cited 1

Multi-modal, multi-resource methods for placing Flickr videos on the map

Proceedings of the 1st ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100\%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82\%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia.