Query-based sampling of text databases

  • Authors:
  • Jamie Callan;Margaret Connell

  • Affiliations:
  • Carnegie Mellon Univ.;Univ., of Massachusetts

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.02

Visualization

Abstract

The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technicque for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic dtabase selection.