The Journal of Machine Learning Research
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Automatically Conflating Road Vector Data with Orthoimagery
Geoinformatica
Automated conflation of digital gazetteer data
International Journal of Geographical Information Science - Digital Gazetteer Research
Geographical information retrieval
International Journal of Geographical Information Science
Mining user similarity based on location history
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
COSIT'11 Proceedings of the 10th international conference on Spatial information theory
A content-driven framework for geolocating microblog users
ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Location-based and preference-aware recommendation using sparse geo-social networking data
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Hi-index | 0.00 |
To a large degree, the attraction of Big Data lies in the variety of its heterogeneous multi-thematic and multi-dimensional data sources and not merely its volume. To fully exploit this variety, however, requires conflation. This is a two step process. First, one has to establish identity relations between information entities across the different data sources; and second, attribute values have to be merged according to certain procedures which avoid logical contradictions. The first step, also called matching, can be thought of as a weighted combination of common attributes according to some similarity measures. In this work, we propose such a matching based on multiple attributes of Points of Interests (POI) from the Location-based Social Network Foursquare and the Yelp local directory service. While both contain overlapping attributes that can be use for matching, they have specific strengths and weaknesses which makes their conflation desirable. We present a weighted multi-attribute matching strategy and evaluate its performance. Our strategy can automatically match 97% of randomly selected Yelp POI to their corresponding Foursquare entities.