Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ACM Transactions on Database Systems (TODS)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient and scalable method for processing top-k spatial Boolean queries
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Spatio-textual similarity joins
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
A spatial similarity join of two geospatial datasets finds pairs of records that are simultaneously similar on spatial and textual attributes. Such join is useful for a variety of applications, like data cleansing, record linkage, duplications detection and geocoding enhancement. Efficient techniques exist for the individual joins on either spatial or textual attributes. However, the combined problem has received much less research attention. This paper presents the SpSJoin (Spatial Similarity join) system to fill in this need. SpSJoin is a platform that merges geospatial and text processing techniques for efficiently performing spatial similarity joins. The platform leverages parallel computing with MapReduce to tackle scalability issues in joining large datasets. The efficiency of the proposed techniques are experimentally validated with a join case for improving the geolocation of entities in a real geospatial dataset with referential entities of another dataset.