Ontology-driven discovery of geospatial evidence in web pages

  • Authors:
  • Karla A. Borges;Clodoveu A. Davis, Jr;Alberto H. Laender;Claudia Bauzer Medeiros

  • Affiliations:
  • PRODABEL-Empresa de Informática e Informação do Município de Belo Horizonte, Belo Horizonte, Brazil 31230-000 and Departamento de Ciência da Computação, Universi ...;Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 31270-010;Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 31270-010;Instituto de Informática, Universidade de Campinas, Campinas, Brazil 13083-970

  • Venue:
  • Geoinformatica
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.