The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing queries over multimedia repositories
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The TSIMMIS Approach to Mediation: Data Models and Languages
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Guidelines for designing usable World Wide Web pages
Conference Companion on Human Factors in Computing Systems
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Infoseek's experiences searching the internet
ACM SIGIR Forum
A probabilistic solution to the selection and fusion problem in distributed information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Efficient and effective metasearch for text databases incorporating linkages among documents
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Computers and Intractability; A Guide to the Theory of NP-Completeness
Computers and Intractability; A Guide to the Theory of NP-Completeness
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web
IEEE Transactions on Knowledge and Data Engineering
Selectivity estimation for fuzzy string predicates in large data sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient, automatic web resource harvesting
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Classification-aware hidden-web text database selection
ACM Transactions on Information Systems (TOIS)
Discovering gis sources on the web using summaries
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
SEPIA: estimating selectivities of approximate string predicates in large Databases
The VLDB Journal — The International Journal on Very Large Data Bases
Supporting keyword queries on structured databases with limited search interfaces
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
An access cost-aware approach for object retrieval over multiple sources
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
An ever-increasing amount of valuable information isstored in Web databases, "hidden" behind search interfaces.To save the user's effort in manually exploring eachdatabase, metasearchers automatically select the most relevantdatabases to a user's query. In thispaper, we focus on one of the technical challenges in metasearching,namely database selection. Past research uses a pre-collectedsummary of each database to estimate its "relevancy" to thequery, and in many cases make incorrect database selection.In this paper, we propose two techniques: probabilisticrelevancy modelling and adaptive probing. First, we modelthe relevancy of each database to a given query as a probabilisticdistribution, derived by sampling that database. Usingthe probabilistic model, the user can explicitly specify a desiredlevel of certainty for database selection. The adaptiveprobing technique decides which and how many databases to contactin order to satisfy the user's requirement. Our experimentson real Hidden-Web databases indicate that our approach significantlyimproves the accuracy of database selection at the cost ofa small number of database probing.