Spatial query processing in an object-oriented database system
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Computing Geographical Scopes of Web Resources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient OLAP Operations in Spatial Data Warehouses
SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
IEEE Transactions on Knowledge and Data Engineering
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Web-a-where: geotagging web content
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid index structures for location-based web search
Proceedings of the 14th ACM international conference on Information and knowledge management
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Keyword Search on Spatial Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Keyword Search in Spatial Databases: Towards Searching by Document
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient retrieval of the top-k most relevant spatial web objects
Proceedings of the VLDB Endowment
Retrieving top-k prestige-based relevant spatial web objects
Proceedings of the VLDB Endowment
IR-Tree: An Efficient Index for Geographic Document Search
IEEE Transactions on Knowledge and Data Engineering
Collective spatial keyword querying
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Efficient processing of top-k spatial keyword queries
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
SpSJoin: parallel spatial similarity joins
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Spatio-textual indexing for geographical search on the web
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Hi-index | 0.00 |
Given a collection of objects that carry both spatial and textual information, a spatio-textual similarity join retrieves the pairs of objects that are spatially close and textually similar. As an example, consider a social network with spatially and textually tagged persons (i.e., their locations and profiles). A useful task (for friendship recommendation) would be to find pairs of persons that are spatially close and their profiles have a large overlap (i.e., they have common interests). Another application is data de-duplication (e.g., finding photographs which are spatially close to each other and high overlap in their descriptive tags). Despite the importance of this operation, there is very little previous work that studies its efficient evaluation and in fact under a different definition; only the best match for each object is identified. In this paper, we combine ideas from state-of-the-art spatial distance join and set similarity join methods and propose efficient algorithms that take into account both spatial and textual constraints. Besides, we propose a batch processing technique which boosts the performance of our approaches. An experimental evaluation using real and synthetic datasets shows that our optimized techniques are orders of magnitude faster than base-line solutions.