Evaluating top-k queries over web-accessible databases

Authors:
Amélie Marian;Nicolas Bruno;Luis Gravano
Affiliations:
Columbia University, New York, NY;Microsoft Research, Redmond, Washington, WA;Columbia University, New York, NY
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2004

Citing 22
Cited 80

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Optimizing disjunctive queries with expensive predicates

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Optimizing queries over multimedia repositories

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
WSQ/DSQ: a practical approach for combined querying of databases and the Web

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Supporting Ranked Boolean Similarity Queries in MARS

IEEE Transactions on Knowledge and Data Engineering
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
A Sampling-Based Estimator for Top-k Query

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering

Adaptive Processing of Top-k Queries in XML

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
An incremental algorithm for computing ranked full disjunctions

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient and self-tuning incremental query expansion for top-k query processing

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A pruning-based approach for supporting Top-K join queries

Proceedings of the 15th international conference on World Wide Web
Progressive skylining over web-accessible databases

Data & Knowledge Engineering
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An incremental algorithm for computing ranked full disjunctions

Journal of Computer and System Sciences
The Threshold Algorithm: From Middleware Systems to the Relational Engine

IEEE Transactions on Knowledge and Data Engineering
Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates

IEEE Transactions on Knowledge and Data Engineering
Top-k Monitoring in Wireless Sensor Networks

IEEE Transactions on Knowledge and Data Engineering
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Partial query resolution for animation authoring

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Efficient online top-K retrieval with arbitrary similarity measures

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ad-hoc aggregations of ranked lists in the presence of hierarchies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On efficient top-k query processing in highly distributed environments

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Top-k Retrieval in Description Logic Programs Under Vagueness for the Semantic Web

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Optimizing Distributed Top-k Queries

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Learning to create data-integrating queries

Proceedings of the VLDB Endowment
Anytime measures for top-k algorithms on exact and fuzzy data sets

The VLDB Journal — The International Journal on Very Large Data Bases
Finding the K highest-ranked answers in a distributed network

Computer Networks: The International Journal of Computer and Telecommunications Networking
Answering linear optimization queries with an approximate stream index

Knowledge and Information Systems
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
Semantics and evaluation of top-k queries in probabilistic databases

Distributed and Parallel Databases
Supporting ranking pattern-based aggregate queries in sequence data cubes

Proceedings of the 18th ACM conference on Information and knowledge management
Subspace Discovery for Promotion: A Cell Clustering Approach

DS '09 Proceedings of the 12th International Conference on Discovery Science
Continuous Processing of Preference Queries in Data Streams

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Maintenance of top-k materialized views

Distributed and Parallel Databases
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Towards fuzzy query answering using fuzzy views - a graded-subsumption-based approach

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Adaptive relaxation for querying heterogeneous XML data sources

Information Systems
Efficient top-k search across heterogeneous XML data sources

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Popularity-guided top-k extraction of entity attributes

Procceedings of the 13th International Workshop on the Web and Databases
Energy-efficient top-k query processing in wireless sensor networks

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Retrieving samples from biobanks

ITBAM'10 Proceedings of the First international conference on Information technology in bio- and medical informatics
A framework for corroborating answers from multiple web sources

Information Systems
Processing top-k join queries

Proceedings of the VLDB Endowment
Power efficiency through tuple ranking in wireless sensor network monitoring

Distributed and Parallel Databases
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient top-k queries for orthogonal ranges

WALCOM'11 Proceedings of the 5th international conference on WALCOM: algorithms and computation
Efficient top-k retrieval for user preference queries

Proceedings of the 2011 ACM Symposium on Applied Computing
Sharing work in keyword search over databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient and generic evaluation of ranked queries

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient distributed top-k query processing with caching

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Indexing for vector projections

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Parallel data access for multiway rank joins

ICWE'11 Proceedings of the 11th international conference on Web engineering
Fast top-k query answering

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
A general top-k algorithm for web data sources

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Distributed processing of continuous sliding-window k-NN queries for data stream filtering

World Wide Web
Privacy-preserving distributed network troubleshooting—bridging the gap between theory and practice

ACM Transactions on Information and System Security (TISSEC)
Processing fuzzy queries in a peer data management system using distributed fuzzy summaries

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Progressive processing of subspace dominating queries

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient non-blocking top-k query processing in distributed networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Processing ranked queries with the minimum space

FoIKS'06 Proceedings of the 4th international conference on Foundations of Information and Knowledge Systems
Community based ranking in peer-to-peer networks

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Chapter 11: rank-join algorithms for search computing

Search Computing
Top-k bounded diversification

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Distributed top-k query processing by exploiting skyline summaries

Distributed and Parallel Databases
A top-k query answering procedure for fuzzy logic programming

Fuzzy Sets and Systems
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
Density index and proximity search in large graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Branch-and-bound algorithm for reverse top-k queries

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shortlisting top-K assignments

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Top-k diversity queries over bounded regions

ACM Transactions on Database Systems (TODS)
Provisional reporting for rank joins

Journal of Intelligent Information Systems
Colored top-K range-aggregate queries

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.