Finding the farm: postal address-based building clustering

  • Authors:
  • Christopher Eby;Alice Armstrong

  • Affiliations:
  • Shippensburg University, Waynesboro, PA;Shippensburg University, Shippensburg, PA

  • Venue:
  • Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Geocoding, the act of mapping place names and addresses to locations on digital maps, is an important feature of many geographical information systems. Yet, traditional geocoding algorithms can be very inaccurate, especially in rural areas. Land plot maps maintained by local governments can be used to increase accuracy but are not always available. A constraint satisfaction method proposed by Michalowski and Knoblock has the potential to greatly increase accuracy by exploiting two widely available datasets, phone book addresses and building locations derived from aerial photographs, but it may still be inaccurate when the number of buildings does not correspond to the number of addresses. Therefore, this research investigates the accuracy of a method of taking addresses and building locations and grouping the buildings into clusters where each cluster contains the buildings present at a single address. The k-means, complete-link, and a minimum spanning tree-based clustering algorithm are all tested on building locations gathered from aerial photographs of predominantly rural Fulton County, PA, to determine which method creates the most accurate clusters. A secondary hypothesis is tested to find whether geolocating to a cluster centroid or to the building within the cluster that is closest to the road produces locations closer to the address locations provided by Fulton County. If the results of these two experiments yield accurate results, they can be used as an important preprocessing step in a geocoding system based on Michalowski and Knoblock's method.