The effectiveness of GIOSS for the text database discovery problem

Authors:
Luis Gravano;Héctor García-Molina;Anthony Tomasic
Affiliations:
Stanford University, Computer Science Dept., Margaret Jacks Hall, Stanford, CA;Stanford University, Computer Science Dept., Margaret Jacks Hall, Stanford, CA;Stanford University, Computer Science Dept., Margaret Jacks Hall, Stanford, CA and Princeton University, Department of Computer Science
Venue:
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Year:
1994

Citing 7
Cited 72

Parallel text search methods

Communications of the ACM
Distributed indexing: a scalable mechanism for distributed information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Content routing for distributed information servers

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Internet Resource Discovery Services

Computer
Internet Resource Discovery at the University of Colorado

Computer
The Efficacy of GlOSS for the Text Database Discovery Problem

The Efficacy of GlOSS for the Text Database Discovery Problem

Information finding in a digital library: the Stanford perspective

ACM SIGMOD Record
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures for efficient broker implementation

ACM Transactions on Information Systems (TOIS)
Metadata for digital libraries: architecture and design rationale

DL '97 Proceedings of the second ACM international conference on Digital libraries
Evaluating the cost of Boolean query mapping

DL '97 Proceedings of the second ACM international conference on Digital libraries
Efficient resource selection in distributed visual information systems

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Making global digital libraries work: collection services, connectivity regions, and collection views

Proceedings of the third ACM conference on Digital libraries
Database selection techniques for routing bibliographic queries

Proceedings of the third ACM conference on Digital libraries
Efficient searching in distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Predicate rewriting for translating Boolean queries in a heterogeneous information system

ACM Transactions on Information Systems (TOIS)
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Partial replica selection based on relevance for information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Using query mediators for distributed searching in federated digital libraries

Proceedings of the fourth ACM conference on Digital libraries
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Database Discovery on the Web: Neural Net Based Approach

Journal of Intelligent Information Systems
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
Expert agreement and content based reranking in a meta search environment using Mearf

Proceedings of the 11th international conference on World Wide Web
Performance Analysis of a Distributed Question/Answering System

IEEE Transactions on Parallel and Distributed Systems
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Metrics for evaluating database selection techniques

World Wide Web
Using Relevance Feedback in Content-Based Image Metasearch

IEEE Internet Computing
NetView: Integrating Large-Scale Distributed Visual Databases

IEEE MultiMedia
Using Distributed Objects for Digital Library Interoperability

Computer
The Conceptual Basis for Mediation Services

IEEE Expert: Intelligent Systems and Their Applications
Early user---system interaction for database selection in massive domain-specific online environments

ACM Transactions on Information Systems (TOIS)
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Impact and Potential of User Profiles Used for Distributed Query Processing Based on Literature Services

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Text-Source Discovery and GlOSS Update in a Dynamic Web

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Hierarchial Multi-agent Organization for Text Database Discovery

PRIMA '99 Proceedings of the Second Pacific Rim International Workshop on Multi-Agents: Approaches to Intelligent Agents
Predicting Indexer Performance in a Distributed Digital Library

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
QUEST - Querying Specialized Collections on the Web

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Text-Based Content Search and Retrieval in Ad-hoc P2P Communities

Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Heterogeneous image database selection on the web

Journal of Systems and Software
Result merging strategies for a current news metasearcher

Information Processing and Management: an International Journal
Neural agent for text database discovery

Intelligent exploration of the web
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications

Information Retrieval
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
A Probabilistic Approach to Metasearching with Adaptive Probing

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Collection selection for managed distributed document databases

Information Processing and Management: an International Journal
Client-system collaboration for legal corpus selection in an online production environment

ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Scalable summary based retrieval in P2P networks

Proceedings of the 14th ACM international conference on Information and knowledge management
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Search and browse services for heterogeneous collections with the peer-to-peer network Pepper

Information Processing and Management: an International Journal
Federated text retrieval from uncooperative overlapped collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
Dynamic selection method of the best search engine for a user's query

Proceedings of the 3rd International Universal Communication Symposium
Web Crawling

Foundations and Trends in Information Retrieval
From uncertain inference to probability of relevance for advanced IR applications

ECIR'03 Proceedings of the 25th European conference on IR research
Collection profiling for collection fusion in distributed information retrieval systems

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
A new perspective on collection selection

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
An access cost-aware approach for object retrieval over multiple sources

Proceedings of the VLDB Endowment
Federated Search

Foundations and Trends in Information Retrieval
Towards benefit-based RDF source selection for SPARQL queries

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Reprint of: The anatomy of a large-scale hypertextual web search engine

Computer Networks: The International Journal of Computer and Telecommunications Networking
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS—Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.