A Probabilistic Approach to Metasearching with Adaptive Probing

  • Authors:
  • Zhenyu Liu;Chang Luo;Junghoo Cho;Wesley W. Chu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

An ever-increasing amount of valuable information isstored in Web databases, "hidden" behind search interfaces.To save the user's effort in manually exploring eachdatabase, metasearchers automatically select the most relevantdatabases to a user's query. In thispaper, we focus on one of the technical challenges in metasearching,namely database selection. Past research uses a pre-collectedsummary of each database to estimate its "relevancy" to thequery, and in many cases make incorrect database selection.In this paper, we propose two techniques: probabilisticrelevancy modelling and adaptive probing. First, we modelthe relevancy of each database to a given query as a probabilisticdistribution, derived by sampling that database. Usingthe probabilistic model, the user can explicitly specify a desiredlevel of certainty for database selection. The adaptiveprobing technique decides which and how many databases to contactin order to satisfy the user's requirement. Our experimentson real Hidden-Web databases indicate that our approach significantlyimproves the accuracy of database selection at the cost ofa small number of database probing.