Scalable top-k spatial keyword search

  • Authors:
  • Dongxiang Zhang;Kian-Lee Tan;Anthony K. H. Tung

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • Proceedings of the 16th International Conference on Extending Database Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this big data era, huge amounts of spatial documents have been generated everyday through various location based services. Top-k spatial keyword search is an important approach to exploring useful information from a spatial database. It retrieves k documents based on a ranking function that takes into account both textual relevance (similarity between the query and document keywords) and spatial relevance (distance between the query and document locations). Various hybrid indexes have been proposed in recent years which mainly combine the R-tree and the inverted index so that spatial pruning and textual pruning can be executed simultaneously. However, the rapid growth in data volume poses significant challenges to existing methods in terms of the index maintenance cost and query processing time. In this paper, we propose a scalable integrated inverted index, named I3, which adopts the Quadtree structure to hierarchically partition the data space into cells. The basic unit of I3 is the keyword cell, which captures the spatial locality of a keyword. Moreover, we design a new storage mechanism for efficient retrieval of keyword cell and preserve additional summary information to facilitate pruning. Experiments conducted on real spatial datasets (Twitter and Wikipedia) demonstrate the superiority of I3 over existing schemes such as IR-tree and S2I in various aspects: it incurs shorter construction time to build the index, it has lower index storage cost, it is order of magnitude faster in updates, and it is highly scalable and answers top-k spatial keyword queries efficiently.