A supervised machine learning approach for duplicate detection over gazetteer records
GeoS'11 Proceedings of the 4th international conference on GeoSpatial semantics
Hi-index | 0.00 |
Due to the growing interest in geospatial data mining and analysis, data cleaning and integration in geospatial data is becoming an important issue. Geospatial entity resolution is the process of reconciling multiple location references to the same real world location within a single data source (deduplication) or across multiple data sources (integration). In this paper, we introduce an interactive tool called GeoDDupe which effectively combines automatic data mining algorithms for geospatial entity resolution with a novel network visualization supporting users' resolution analysis and decisions. We illustrate the GeoDDupe interface with an example geospatial dataset and show how users can efficiently and accurately resolve location entities. Finally, the case study with two real-world geospatial datasets demonstrates the potential of GeoDDupe.