A Probabilistic Approach to Metasearching with Adaptive Probing

Authors:
Zhenyu Liu;Chang Luo;Junghoo Cho;Wesley W. Chu
Affiliations:
-;-;-;-
Venue:
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Year:
2004

Citing 21
Cited 9

The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing queries over multimedia repositories

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Guidelines for designing usable World Wide Web pages

Conference Companion on Human Factors in Computing Systems
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Infoseek's experiences searching the internet

ACM SIGIR Forum
A probabilistic solution to the selection and fusion problem in distributed information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web

IEEE Transactions on Knowledge and Data Engineering
Selectivity estimation for fuzzy string predicates in large data sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient, automatic web resource harvesting

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
Discovering gis sources on the web using summaries

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
SEPIA: estimating selectivities of approximate string predicates in large Databases

The VLDB Journal — The International Journal on Very Large Data Bases
Supporting keyword queries on structured databases with limited search interfaces

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
An access cost-aware approach for object retrieval over multiple sources

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

An ever-increasing amount of valuable information isstored in Web databases, "hidden" behind search interfaces.To save the user's effort in manually exploring eachdatabase, metasearchers automatically select the most relevantdatabases to a user's query. In thispaper, we focus on one of the technical challenges in metasearching,namely database selection. Past research uses a pre-collectedsummary of each database to estimate its "relevancy" to thequery, and in many cases make incorrect database selection.In this paper, we propose two techniques: probabilisticrelevancy modelling and adaptive probing. First, we modelthe relevancy of each database to a given query as a probabilisticdistribution, derived by sampling that database. Usingthe probabilistic model, the user can explicitly specify a desiredlevel of certainty for database selection. The adaptiveprobing technique decides which and how many databases to contactin order to satisfy the user's requirement. Our experimentson real Hidden-Web databases indicate that our approach significantlyimproves the accuracy of database selection at the cost ofa small number of database probing.