Semantic extraction of geographic data from web tables for big data integration

Authors:
Isabel F. Cruz;Venkat R. Ganesh;Seyed Iman Mirrezaei
Affiliations:
University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago
Venue:
Proceedings of the 7th Workshop on Geographic Information Retrieval
Year:
2013

Citing 25
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Geospatial mapping and navigation of the web

Proceedings of the 10th international conference on World Wide Web
Jena: A Semantic Web Toolkit

IEEE Internet Computing
Measuring Structural Similarity Among Web Documents: Preliminary Results

EP '98/RIDT '98 Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography: Electronic Publishing, Artistic Imaging, and Digital Typography
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Toponym resolution in text (abstract only): "which sheffield is it?"

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Grounding spatial named entities for information extraction and question answering

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
A visual tool for ontology alignment to enable geospatial interoperability

Journal of Visual Languages and Computing
Using co-occurrence models for placename disambiguation

International Journal of Geographical Information Science
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Triplify: light-weight linked data publication from relational databases

Proceedings of the 18th international conference on World wide web
Spatio-textual spreadsheets: geotagging via spatial coherence

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
AgreementMaker: efficient matching for large real-world schemas and ontologies

Proceedings of the VLDB Endowment
Moving Phenomenon: Aggregation and Analysis of Geotime-Tagged Contents on the Web

W2GIS '09 Proceedings of the 9th International Symposium on Web and Wireless Geographical Information Systems
Web-scale knowledge extraction from semi-structured tables

Proceedings of the 19th international conference on World wide web
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
NET – a system for extracting web data from flat and nested data records

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Semi-automatically mapping structured sources into the semantic web

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
A domain independent framework for extracting linked semantic data from tables

Search Computing
Entity discovery and annotation in tables

Proceedings of the 16th International Conference on Extending Database Technology
Building linked ontologies with high precision using subclass mapping discovery

Artificial Intelligence Review
GIVA: a semantic framework for geospatial and temporal data integration, visualization, and analytics

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are millions of web tables with geographic data that are pertinent for big data integration in a variety of domain applications, such as urban sustainability, transportation networks, policy studies, and public health. These tables, however, are heterogeneous in structure, concepts, and metadata. One of the challenges in semantically extracting geographic data is the need to resolve these heterogeneities so as to uncover a conceptual hierarchy, metadata associated with instances, and geographic information---corresponding respectively to ontologies, elements that we call features, and cell values that can be used to identify geographic coordinates. In this paper, we present an architecture with methods to: (1) extract feature-rich web tables; (2) identify features; (3) construct a schema and instances using RDF; (4) perform geocoding. Preliminary experiments led to high accuracy in table identification and feature naming even when compared to manual evaluation.