Extracting metadata for spatially-aware information retrieval on the internet

  • Authors:
  • Paul Clough

  • Affiliations:
  • University of Sheffield, Sheffield, UK

  • Venue:
  • Proceedings of the 2005 workshop on Geographic information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents methods used to extract geospatial information from web pages for use in SPIRIT, a new Geographic Information Retrieval (GIR) system for the web. The resulting geospatial markup tools have been used to annotate around 900,000 web pages taken from a 1TB web crawl, focused on regions in the UK, France, Germany and Switzerland. This paper discusses a versatile geo-parsing tool for extracting spatial metadata based upon the GATE Information Extraction (IE) system, and a simple geo-coding program based on default sense to assign spatial coordinates to extracted locations. A preliminary analysis of markup accuracy for geo-parsing and geo-coding is provided, and an initial statistical and geographical analysis of the SPIRIT collection presented.