Distributed indexing: a scalable mechanism for distributed information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval: Application Service Definition and Protocol Specification, Z39.50-1995
Information Retrieval: Application Service Definition and Protocol Specification, Z39.50-1995
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning a monolingual language model from a multilingual text database
Proceedings of the ninth international conference on Information and knowledge management
Collection selection and results merging with topically organized U.S. patents and TREC data
Proceedings of the ninth international conference on Information and knowledge management
Discovery of similarity computations of search engines
Proceedings of the ninth international conference on Information and knowledge management
Towards a highly-scalable and effective metasearch engine
Proceedings of the 10th international conference on World Wide Web
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SDLIP + STARTS = SDARTS a protocol and toolkit for metasearching
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
A highly scalable and effective method for metasearch
ACM Transactions on Information Systems (TOIS)
Mining the web to create minority language corpora
Proceedings of the tenth international conference on Information and knowledge management
Discovering the representative of a search engine
Proceedings of the tenth international conference on Information and knowledge management
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Performance Analysis of a Distributed Question/Answering System
IEEE Transactions on Parallel and Distributed Systems
Discovering the representative of a search engine
Proceedings of the eleventh international conference on Information and knowledge management
A Statistical Method for Estimating the Usefulness of Text Databases
IEEE Transactions on Knowledge and Data Engineering
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Heterogeneous image database selection on the web
Journal of Systems and Software
Result merging strategies for a current news metasearcher
Information Processing and Management: an International Journal
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Distributed information retrieval: a multi-objective resource selection approach
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems
Learning query languages of Web interfaces
Proceedings of the 2004 ACM symposium on Applied computing
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Collection selection for managed distributed document databases
Information Processing and Management: an International Journal
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Knocking the door to the deep Web: integrating Web query interfaces
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications
ACM SIGMOD Record
Discovering and ranking web services with BASIL: a personalized approach with biased focus
Proceedings of the 2nd international conference on Service oriented computing
Building Minority Language Corpora by Learning to Generate Web Search Queries
Knowledge and Information Systems
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Server selection methods in hybrid portal search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information source selection for resource constrained environments
ACM SIGMOD Record
Automatic structured query transformation over distributed digital libraries
Proceedings of the 2006 ACM symposium on Applied computing
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Does pseudo-relevance feedback improve distributed information retrieval systems?
Information Processing and Management: an International Journal
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Distributed text retrieval from overlapping collections
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Using query logs to establish vocabularies in distributed information retrieval
Information Processing and Management: an International Journal
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
CLASCN: candidate network selection for efficient top-k keyword queries over databases
Journal of Computer Science and Technology
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Classification-aware hidden-web text database selection
ACM Transactions on Information Systems (TOIS)
Mining world knowledge for analysis of search engine content
Web Intelligence and Agent Systems
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
SUSHI: scoring scaled samples for server selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in personal metasearch: a comparative empirical study
Information Retrieval
Improving the evaluation of web search systems
ECIR'03 Proceedings of the 25th European conference on IR research
Processing queries in a large peer-to-peer system
CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
An effective query relaxation solution for the deep web
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Foundations and Trends in Information Retrieval
Sample sizes for query probing in uncooperative distributed information retrieval
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Clustering structured web sources: a schema-based, model-differentiation approach
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hi-index | 0.00 |
The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations.This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.