Distributed top-N query processing with possibly uncooperative local systems

Authors:
Clement Yu;George Philip;Weiyi Meng
Affiliations:
Dept. of Computer Science, U. of Illinois at Chicago, Chicago, IL;Dept. of Computer Science, U. of Illinois at Chicago, Chicago, IL;Dept. of Computer Science, SUNY at Binghamton, Binghamton, NY
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 20
Cited 13

Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On self-organizing sequential search heuristics

Communications of the ACM
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database selection for processing k nearest neighbors queries in distributed environments

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
G-Tree: A New Data Structure for Organizing Multidimensional Data

IEEE Transactions on Knowledge and Data Engineering
Merging Ranks from Heterogeneous Internet Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

The VLDB Journal — The International Journal on Very Large Data Bases
Query Routing in Large-Scale Digital Library Systems

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Preference SQL: design, implementation, experiences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Reducing network traffic in unstructured P2P systems using Top-k queries

Distributed and Parallel Databases
Efficient detection of empty-result queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Top-k Monitoring in Wireless Sensor Networks

IEEE Transactions on Knowledge and Data Engineering
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Information discovery across multiple streams

Information Sciences: an International Journal
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Top-k query processing in the APPA P2P system

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Semantic-distance based evaluation of ranking queries over relational databases

Journal of Intelligent Information Systems
Processing fuzzy queries in a peer data management system using distributed fuzzy summaries

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
P2P-based web text information retrieval

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Efficient non-blocking top-k query processing in distributed networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Querying e-catalogs using content summaries

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of processing top-N queries in a distributed environment with possibly uncooperative local database systems. For a given top-N query, the problem is to find the N tuples that satisfy the query the best but not necessarily completely in an efficient manner. Top-N queries are gaining popularity in relational databases and are expected to be very useful for e-commerce applications. Many companies provide the same type of goods and services to the public on the Web, and relational databases may be employed to manage the data. It is not feasible for a user to query a large number of databases. It is therefore desirable to provide a facility where a user query is accepted at some site, suitable tuples from appropriate sites are retrieved and the results are merged and then presented to the user. In this paper, we present a method for constructing the desired facility. Our method consists of two steps. The first step determines which databases are likely to contain the desired tuples for a given query so that the databases can be ranked based on their desirability with respect to the query. Four different techniques are introduced for this step with one requiring no cooperation from local systems. The second step determines how the ranked databases should be searched and what tuples from the searched databases should be returned. A new algorithm is proposed for this purpose. Experimental results are presented to compare different methods and very promising results are obtained using the method that requires no cooperation from local databases.