Semantic extraction of geographic data from web tables for big data integration

  • Authors:
  • Isabel F. Cruz;Venkat R. Ganesh;Seyed Iman Mirrezaei

  • Affiliations:
  • University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago

  • Venue:
  • Proceedings of the 7th Workshop on Geographic Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are millions of web tables with geographic data that are pertinent for big data integration in a variety of domain applications, such as urban sustainability, transportation networks, policy studies, and public health. These tables, however, are heterogeneous in structure, concepts, and metadata. One of the challenges in semantically extracting geographic data is the need to resolve these heterogeneities so as to uncover a conceptual hierarchy, metadata associated with instances, and geographic information---corresponding respectively to ontologies, elements that we call features, and cell values that can be used to identify geographic coordinates. In this paper, we present an architecture with methods to: (1) extract feature-rich web tables; (2) identify features; (3) construct a schema and instances using RDF; (4) perform geocoding. Preliminary experiments led to high accuracy in table identification and feature naming even when compared to manual evaluation.