GlOSS: text-source discovery over the Internet

Authors:
Luis Gravano;Héctor García-Molina;Anthony Tomasic
Affiliations:
Columbia Univ., New York, NY;Stanford Univ., Stanford, CA;INRIA Rocquencourt, Le Chesnay, France
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1999

Citing 22
Cited 125

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
An information system for corporate users: wide area information servers

Online
Distributed indexing: a scalable mechanism for distributed information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Expert systems for online business database selection: the problem of choosing online business sources

Library Hi Tech - A special issue on artificial intelligence, knowledge systems, and the future library
Content routing for distributed information servers

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures for efficient broker implementation

ACM Transactions on Information Systems (TOIS)
Evaluating database selection techniques: a testbed and experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Precision and recall of GIOSS estimators for database discovery

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Internet Resource Discovery Services

Computer
Internet Resource Discovery at the University of Colorado

Computer
Boolean Query Mapping Across Heterogeneous Information Sources

IEEE Transactions on Knowledge and Data Engineering
Merging Ranks from Heterogeneous Internet Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Comparison of Two Methods for Boolean Query Relevance Feedback

A Comparison of Two Methods for Boolean Query Relevance Feedback
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
The Efficacy of GlOSS for the Text Database Discovery Problem

The Efficacy of GlOSS for the Text Database Discovery Problem
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources

Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
SIFT: a tool for wide-area information dissemination

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ZBroker: a query routing broker for Z39.50 databases

Proceedings of the eighth international conference on Information and knowledge management
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SDLIP + STARTS = SDARTS a protocol and toolkit for metasearching

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
The effectiveness of query expansion for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Approaches to collection selection and results merging for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness

Proceedings of the tenth international conference on Information and knowledge management
Improvement of HITS-based algorithms on web documents

Proceedings of the 11th international conference on World Wide Web
Extending SDARTS: extracting metadata from web databases and interfacing with the open archives initiative

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
A logistic regression approach to distributed IR

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Manual Indexing to Improve Collection Selection and Retrieval Effectiveness

Information Retrieval
A new method for automatic performance comparison of search engines

World Wide Web
Metrics for evaluating database selection techniques

World Wide Web
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Computing Geographical Scopes of Web Resources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Cluster-Based Database Selection Techniques for Routing Bibliographic Queries

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
pSearch: information retrieval in structured overlays

ACM SIGCOMM Computer Communication Review
Searching large text collections

Handbook of massive data sets
Automated discovery of search interfaces on the web

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
SETS: search enhanced by topic segmentation

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Evaluating database selection algorithms for distributed search

Proceedings of the 2003 ACM symposium on Applied computing
Automated index management for distributed web search

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simplified access to structured databases by adapting keyword search and database selection

Proceedings of the 2004 ACM symposium on Applied computing
Learning query languages of Web interfaces

Proceedings of the 2004 ACM symposium on Applied computing
A Probabilistic Approach to Metasearching with Adaptive Probing

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On scaling latent semantic indexing for large peer-to-peer systems

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
The robustness of content-based search in hierarchical peer to peer networks

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Specialisation dynamics in federated web search

Proceedings of the 6th annual ACM international workshop on Web information and data management
Guiding queries to information sources with InfoBeacons

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Modeling and Managing Content Changes in Text Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improving collection selection with overlap awareness in P2P search engines

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Genes Using Gene Expression and Text Literature Data

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Evaluating profiling and query expansion methods for P2P information retrieval

Proceedings of the 2005 ACM workshop on Information retrieval in peer-to-peer networks
Information source selection for resource constrained environments

ACM SIGMOD Record
Two-stage statistical language models for text database selection

Information Retrieval
Distributed information retrieval with skewed database size distributions

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Automatic structured query transformation over distributed digital libraries

Proceedings of the 2006 ACM symposium on Applied computing
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Towards better measures: evaluation of estimated resource description quality for distributed IR

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Testing the cluster hypothesis in distributed information retrieval

Information Processing and Management: an International Journal
MAPS: approximate publish/subscribe functionality in peer-to-peer networks

Proceedings of the 1st international workshop on Advanced data processing in ubiquitous computing (ADPUC 2006)
Efficient peer-to-peer semantic overlay networks based on statistical language models

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Size doesn't always matter: exploiting pageRank for query routing in distributed IR

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Energy and quality aware query processing in wireless sensor database systems

Information Sciences: an International Journal
An adaptive crawler for locating hidden-Web entry points

Proceedings of the 16th international conference on World Wide Web
Effective keyword-based selection of relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Result merging methods in distributed information retrieval with overlapping databases

Information Retrieval
Modeling and managing changes in text databases

ACM Transactions on Database Systems (TODS)
Distributed text retrieval from overlapping collections

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Using query logs to establish vocabularies in distributed information retrieval

Information Processing and Management: an International Journal
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques

IEEE Transactions on Parallel and Distributed Systems
Privacy-preserving indexing of documents on the network

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
Dynamic adaptation of multi-key index for distributed database system

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
A graph method for keyword-based selection of the top-K databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining world knowledge for analysis of search engine content

Web Intelligence and Agent Systems
Discovering gis sources on the web using summaries

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Content-based search using self-organizing peer-to-peer network

SEPADS'08 Proceedings of the 7th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
An Architecture for Hybrid P2P Free-Text Search

CIA '07 Proceedings of the 11th international workshop on Cooperative Information Agents XI
Logistic Regression and EVIs for XML Books and the Heterogeneous Track

Focused Access to XML Documents
Image Data Source Selection Using Gaussian Mixture Models

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Adaptive indexing for content-based search in P2P systems

Data & Knowledge Engineering
Adaptive distributed indexing for structured peer-to-peer networks

Proceedings of the 17th ACM conference on Information and knowledge management
Integral based source selection for uncooperative distributed information retrieval environments

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Ranking information resources in peer-to-peer text retrieval: an experimental study

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Efficient query routing by improved peer description in P2P networks

Proceedings of the 3rd international conference on Scalable information systems
A protocol for self-organizing peer-to-peer network supporting content-based search

WSEAS Transactions on Information Science and Applications
PHIRST: A distributed architecture for P2P information retrieval

Information Systems
Multiagent system for learning objects retrieval with context attributes

International Journal of Computer Applications in Technology
Facilitating discovery on the private web using dataset digests

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Contextualized query sampling to discover semantic resource descriptions on the web

Information Processing and Management: an International Journal
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Simple Adaptations of Data Fusion Algorithms for Source Selection

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Privacy-preserving indexing of documents on the network

The VLDB Journal — The International Journal on Very Large Data Bases
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
An evolutionary approach to query-sampling for heterogeneous systems

Expert Systems with Applications: An International Journal
Classification-based resource selection

Proceedings of the 18th ACM conference on Information and knowledge management
Dynamic selection method of the best search engine for a user's query

Proceedings of the 3rd International Universal Communication Symposium
Aspects of adaptivity in P2P information retrieval

AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
LCA-based selection for XML document collections

Proceedings of the 19th international conference on World wide web
Database selection and result merging in P2P web search

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Data sources selection for XML data sources

International Journal of Intelligent Information and Database Systems
A conceptual model for user-centered quality information retrieval on the World Wide Web

Journal of Intelligent Information Systems
Facilitating discovery on the private web using dataset digests

International Journal of Metadata, Semantics and Ontologies
A joint probabilistic classification model for resource selection

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Indexing source descriptions based on defined classes

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
PruSM: a prudent schema matching approach for web forms

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modeling information sources as integrals for effective and efficient source selection

Information Processing and Management: an International Journal
HAPS: supporting effective and efficient full-text P2P search with peer dynamics

Journal of Computer Science and Technology
PISA: A framework for integrating uncooperative peers into P2P-based federated search

Computer Communications
Federated Search

Foundations and Trends in Information Retrieval
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities

From active data management to event-based systems and more
K-graphs: selecting top-k data sources for XML keyword queries

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
A multi-collection latent topic model for federated search

Information Retrieval
Evolutionary approach for semantic-based query sampling in large-scale information sources

Information Sciences: an International Journal
Beauty and the beast: the theory and practice of information integration

ICDT'07 Proceedings of the 11th international conference on Database Theory
On the usage of global document occurrences in peer-to-peer information systems

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Sample sizes for query probing in uncooperative distributed information retrieval

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
IQN routing: integrating quality and novelty in P2P querying and ranking

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Using information retrieval techniques to route queries in an infobeacons network

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
The MINERVA project: towards collaborative search in digital libraries using peer-to-peer technology

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
Peer-to-Peer Information Retrieval: An Overview

ACM Transactions on Information Systems (TOIS)
Top-K data source selection for keyword queries over multiple XML data sources

Journal of Information Science
Mixture model with multiple centralized retrieval algorithms for result merging in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
MinervaDL: an architecture for information retrieval and filtering in distributed digital libraries

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Shard ranking and cutoff estimation for topically partitioned collections

Proceedings of the 21st ACM international conference on Information and knowledge management
Semantic query reformulation: the NIF experience

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.