Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval

Authors:
Jong Wook Kim;K. Selçuk Candan
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 21
Cited 6

Strategies for efficient incremental nearest neighbor search

Pattern Recognition
Inverted files

Information retrieval
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Extended Boolean information retrieval

Communications of the ACM
A vector space model for automatic indexing

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Efficient Progressive Skyline Computation

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
CubeSVD: a novel approach to personalized Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Ranking objects based on relationships

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Branch and Bound Algorithm for Computing k-Nearest Neighbors

IEEE Transactions on Computers
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ad-hoc aggregations of ranked lists in the presence of hierarchies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Feedback-driven result ranking and query refinement for exploring semi-structured data collections

Proceedings of the 13th International Conference on Extending Database Technology
R2DF framework for ranked path queries over weighted RDF graphs

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Search behavior-driven training for result re-ranking

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Learning to rank user intent

Proceedings of the 20th ACM international conference on Information and knowledge management
Top-k query processing for combinatorial objects using Euclidean distance

Proceedings of the 15th Symposium on International Database Engineering & Applications
Distributed similarity estimation using derived dimensions

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.