SpSJoin: parallel spatial similarity joins

Authors:
Jaime Ballesteros;Ariel Cary;Naphtali Rishe
Affiliations:
Florida International University, Miami, FL;Florida International University, Miami, FL;Florida International University, Miami, FL
Venue:
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Year:
2011

Citing 6
Cited 1

Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Spatial join techniques

ACM Transactions on Database Systems (TODS)
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient similarity joins for near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient and scalable method for processing top-k spatial Boolean queries

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management

Spatio-textual similarity joins

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

A spatial similarity join of two geospatial datasets finds pairs of records that are simultaneously similar on spatial and textual attributes. Such join is useful for a variety of applications, like data cleansing, record linkage, duplications detection and geocoding enhancement. Efficient techniques exist for the individual joins on either spatial or textual attributes. However, the combined problem has received much less research attention. This paper presents the SpSJoin (Spatial Similarity join) system to fill in this need. SpSJoin is a platform that merges geospatial and text processing techniques for efficiently performing spatial similarity joins. The platform leverages parallel computing with MapReduce to tackle scalability issues in joining large datasets. The efficiency of the proposed techniques are experimentally validated with a join case for improving the geolocation of entities in a real geospatial dataset with referential entities of another dataset.