Weighted multi-attribute matching of user-generated points of interest

  • Authors:
  • Grant McKenzie;Krzysztof Janowicz;Benjamin Adams

  • Affiliations:
  • University of California, Santa, Barbara;University of California, Santa, Barbara;University of California, Santa, Barbara

  • Venue:
  • Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

To a large degree, the attraction of Big Data lies in the variety of its heterogeneous multi-thematic and multi-dimensional data sources and not merely its volume. To fully exploit this variety, however, requires conflation. This is a two step process. First, one has to establish identity relations between information entities across the different data sources; and second, attribute values have to be merged according to certain procedures which avoid logical contradictions. The first step, also called matching, can be thought of as a weighted combination of common attributes according to some similarity measures. In this work, we propose such a matching based on multiple attributes of Points of Interests (POI) from the Location-based Social Network Foursquare and the Yelp local directory service. While both contain overlapping attributes that can be use for matching, they have specific strengths and weaknesses which makes their conflation desirable. We present a weighted multi-attribute matching strategy and evaluate its performance. Our strategy can automatically match 97% of randomly selected Yelp POI to their corresponding Foursquare entities.