IO-Top-k: index-access optimized top-k query processing

Authors:
Holger Bast;Debapriyo Majumdar;Ralf Schenkel;Martin Theobald;Gerhard Weikum
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 22
Cited 43

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Combining fuzzy information: an overview

ACM SIGMOD Record
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive Processing of Top-k Queries in XML

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Space-Limited ranked query evaluation using adaptive pruning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Efficient processing of distributed top-k queries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

The database research group at the Max-Planck Institute for Informatics

ACM SIGMOD Record
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ad-hoc top-k query answering for data streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Efficient online top-K retrieval with arbitrary similarity measures

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ad-hoc aggregations of ranked lists in the presence of hierarchies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient top-k querying over social-tagging networks

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
TopX @ INEX 2007

Focused Access to XML Documents
On Top-k Search with No Random Access Using Small Memory

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Speeding Up the NRA Algorithm

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Ranking objects based on relationships and fixed associations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficient skyline retrieval with arbitrary similarity measures

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Measurement Techniques and Caching Effects

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

Information Sciences: an International Journal
Top-k Queries with Contextual Fuzzy Preferences

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Optimal algorithms for evaluating rank joins in database systems

ACM Transactions on Database Systems (TODS)
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
On fuzzy queries with contextual predicates

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Top-k vectorial aggregation queries in a distributed environment

Journal of Parallel and Distributed Computing
Efficient RkNN retrieval with arbitrary non-metric similarity measures

Proceedings of the VLDB Endowment
Supporting early pruning in top-k query processing on massive data

Information Processing Letters
Best position algorithms for efficient top-k query processing

Information Systems
Efficient and generic evaluation of ranked queries

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Complex pattern ranking (CPR): evaluating top-k pattern queries over event streams

Proceedings of the 5th ACM international conference on Distributed event-based system
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Retrieving customary web language to assist writers

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Efficient top-k document retrieval using a term-document binary matrix

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Processing top-k queries in distributed hash tables

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Subspace top-k query processing using the hybrid-layer index with a tight bound

Data & Knowledge Engineering
Optimizing top-k document retrieval strategies for block-max indexes

Proceedings of the sixth ACM international conference on Web search and data mining
Faster upper bounding of intersection sizes

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Top-k queries over web applications

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient parallel block-max WAND algorithm

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates.The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods.