Exploration and comparison of geographic information sources using distance statistics

  • Authors:
  • Christian Sengstock;Michael Gertz

  • Affiliations:
  • Heidelberg University, Germany;Heidelberg University, Germany

  • Venue:
  • Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the steadily increasing amount of geographic information on the Web, there is a strong need for suitable methods in exploratory data analysis that can be used to efficiently describe the characteristics of such large-scale, often noisy datasets. Existing methods in spatial data mining focus primarily on mining patterns describing spatial proximity relationships such as co-location patterns or spatial associations rules. In this paper, we present a novel approach to describe the spatial characteristics of geographic information sources comprised of instances of geographic features. Using the concept of interaction characteristics of geographic features, similarities in how features are distributed in space can be computed and interesting patterns of similar features in the datasets regarding their geographic semantics (landmark, local, regional, global) can be determined. For this, we employ clustering techniques of spatial distance statistics. We discuss the properties of our method and detail a comprehensive evaluation using publicly available datasets (Flickr, Twitter, OpenStreeMap). We demonstrate the feasibility of identifying groups of geographic features with distinct geographic semantics, which then can be used to select subsets of features for subsequent learning tasks or to compare different datasets.