Being picky: processing top-k queries with set-defined selections

Authors:
Aleksandar Stupar;Sebastian Michel
Affiliations:
Saarland University, Saarbruecken, Germany;Saarland University, Saarbruecken, Germany
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 29
Cited 0

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Type less, find more: fast autocompletion search with a succinct index

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Answering top-k queries with multi-dimensional selections: the ranking cube approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Database Systems: The Complete Book

Database Systems: The Complete Book
P-Cube: Answering Preference Queries in Multi-Dimensional Space

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A workload-driven unit of cache replacement for mid-tier database caching

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space

Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space
Schism: a workload-driven approach to database replication and partitioning

Proceedings of the VLDB Endowment
Processing top-k join queries

Proceedings of the VLDB Endowment
Picasso - to sing, you must close your eyes and draw

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Top-K color queries for document retrieval

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.