Data structures for efficient broker implementation

Authors:
Anthony Tomasic;Luis Gravano;Calvin Lue;Peter Schwarz;Laura Haas
Affiliations:
INRIA Le Chesnay, France;Stanford Univ., Stanford, CA;IBM Almaden Research, San Jose, CA;IBM Almaden Research, San Jose, CA;IBM Almaden Research, San Jose, CA
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
1997

Citing 25
Cited 14

Implementation of the grid file: design concepts and experience

BIT
Multiattribute hashing using Gray codes

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
File organization for database design

File organization for database design
Optimization for dynamic inverted index maintenance

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed indexing: a scalable mechanism for distributed information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Content routing for distributed information servers

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A general solution of the n-dimensional B-tree problem

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Performance issues in distributed shared-nothing information-retrieval systems

Information Processing and Management: an International Journal
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Optimal partial-match retrieval when fields are independently specified

ACM Transactions on Database Systems (TODS)
Precision and recall of GIOSS estimators for database discovery

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A class of data structures for associative searching

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Internet Resource Discovery Services

Computer
A New Algorithm for Computing Joins with Grid Files

Proceedings of the Ninth International Conference on Data Engineering
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
An Efficient Indexing Technique for Full Text Databases

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Fast Incremental Indexing for Full-Text Information Retrieval

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

Database selection techniques for routing bibliographic queries

Proceedings of the third ACM conference on Digital libraries
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Ontological Approach for Information Discovery in Internet Databases

Distributed and Parallel Databases
Distributed resource discovery: using z39.50 to build cross-domain information servers

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Metrics for evaluating database selection techniques

World Wide Web
WebFindIt: An Architecture and System for Querying Web Databases

IEEE Internet Computing
Supporting Dynamic Interactions among Web-Based Information Sources

IEEE Transactions on Knowledge and Data Engineering
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
Adaptive web-based database communities

Information modeling for internet applications
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A heuristic approach to network optimised mapping of a distributed resource discovery architecture

International Journal of Computer Applications in Technology
Dynamic adaptation of multi-key index for distributed database system

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases and must be compact while allowing efficient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databased and can examine the performance of accessing theGlOSS summeries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of workloads), and we discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; inparticular, we show that by varying the policies used to construct the grid file we can provide good performance for a wide range of workloads even when storing highly skewed data.