Top-k query evaluation with probabilistic guarantees

Authors:
Martin Theobald;Gerhard Weikum;Ralf Schenkel
Affiliations:
Max-Planck Institute of Computer Science, Saarbruecken, Germany;Max-Planck Institute of Computer Science, Saarbruecken, Germany;Max-Planck Institute of Computer Science, Saarbruecken, Germany
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 29
Cited 81

Probability, statistics, and queueing theory with computer science applications

Probability, statistics, and queueing theory with computer science applications
Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling

Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Database selection for processing k nearest neighbors queries in distributed environments

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Introduction to Algorithms

Introduction to Algorithms
Searching in metric spaces with user-defined and approximate distances

ACM Transactions on Database Systems (TODS)
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Region proximity in metric spaces and its use for approximate similarity search

ACM Transactions on Information Systems (TOIS)
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Toward a Usable Theory of Chernoff Bounds for Heterogeneous and Partially Dependent Random Variables

Toward a Usable Theory of Chernoff Bounds for Heterogeneous and Partially Dependent Random Variables
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
The power-method: a comprehensive estimation technique for multi-dimensional queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Efficient Inverted Lists and Query Algorithms for Structured Value Ranking in Update-Intensive Relational Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient and self-tuning incremental query expansion for top-k query processing

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Database-inspired search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Reducing network traffic in unstructured P2P systems using Top-k queries

Distributed and Parallel Databases
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Processing relaxed skylines in PDMS using distributed data summaries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The TopX DB&IR engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates

IEEE Transactions on Knowledge and Data Engineering
Top-k Monitoring in Wireless Sensor Networks

IEEE Transactions on Knowledge and Data Engineering
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A time machine for text search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Effective top-k computation in retrieving structured documents with term-proximity support

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Ad-hoc top-k query answering for data streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A general framework for modeling and processing optimization queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Efficient online top-K retrieval with arbitrary similarity measures

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Processing top k queries from samples

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Ad-hoc aggregations of ranked lists in the presence of hierarchies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting personalized ranking over categorical attributes

Information Sciences: an International Journal
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
A research agenda for query processing in large-scale peer data management systems

Information Systems
Processing top-k queries from samples

Computer Networks: The International Journal of Computer and Telecommunications Networking
TopX @ INEX 2007

Focused Access to XML Documents
Efficient Top-k Data Sources Ranking for Query on Deep Web

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Optimizing Distributed Top-k Queries

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Approximating query completeness by predicting the number of answers in DHT-based web applications

Proceedings of the 10th ACM workshop on Web information and data management
Information Extraction

Foundations and Trends in Databases
Anytime measures for top-k algorithms on exact and fuzzy data sets

The VLDB Journal — The International Journal on Very Large Data Bases
Consistent Top-k Queries over Time

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Selective-NRA Algorithms for Top-k Queries

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Finding the K highest-ranked answers in a distributed network

Computer Networks: The International Journal of Computer and Telecommunications Networking
Effective top-k computation with term-proximity support

Information Processing and Management: an International Journal
Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
Probabilistic static pruning of inverted files

ACM Transactions on Information Systems (TOIS)
Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

Proceedings of the VLDB Endowment
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Top-k query processing in the APPA P2P system

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Graph-based concept identification and disambiguation for enterprise search

Proceedings of the 19th international conference on World wide web
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient wikipedia-based semantic interpreter by exploiting top-k processing

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A framework for corroborating answers from multiple web sources

Information Systems
Top-k vectorial aggregation queries in a distributed environment

Journal of Parallel and Distributed Computing
Processing top-k join queries

Proceedings of the VLDB Endowment
An access cost-aware approach for object retrieval over multiple sources

Proceedings of the VLDB Endowment
Distributed adaptive top-k monitoring in wireless sensor networks

Journal of Systems and Software
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
An optimal strategy for monitoring top-k queries in streaming windows

Proceedings of the 14th International Conference on Extending Database Technology
TopRecs: Top-k algorithms for item-based collaborative filtering

Proceedings of the 14th International Conference on Extending Database Technology
Supporting early pruning in top-k query processing on massive data

Information Processing Letters
Efficient approximate top-k query algorithm using cube index

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Index design and query processing for graph conductance search

The VLDB Journal — The International Journal on Very Large Data Bases
Indexing strategies for graceful degradation of search quality

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Processing fuzzy queries in a peer data management system using distributed fuzzy summaries

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
TOP-k query calculation in peer-to-peer networks

ASIAN'05 Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web
Models and indices for integrating unstructured data with a relational database

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
On the usage of global document occurrences in peer-to-peer information systems

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
TopX and XXL at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Attribute and object selection queries on objects with probabilistic attributes

ACM Transactions on Database Systems (TODS)
Efficient processing of distributed top-k queries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Towards a common framework for peer-to-peer web retrieval

From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments
The MINERVA project: towards collaborative search in digital libraries using peer-to-peer technology

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine

ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient approximation of the maximal preference scores by lightweight cubic views

Proceedings of the 15th International Conference on Extending Database Technology
Top-k linked data query processing

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Optimizing ranked retrieval

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Being picky: processing top-k queries with set-defined selections

Proceedings of the 21st ACM international conference on Information and knowledge management
Provisional reporting for rank joins

Journal of Intelligent Information Systems
On the modelling of ranking algorithms in probabilistic datalog

Proceedings of the 7th International Workshop on Ranking in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin's threshold algorithm (TA). Since the user's goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.