Unbiased estimation of size and other aggregates over hidden web databases

  • Authors:
  • Arjun Dasgupta;Xin Jin;Bradley Jewell;Nan Zhang;Gautam Das

  • Affiliations:
  • University of Texas at Arlington, Arlington, TX, USA;George Washington University, Washington, D.C., USA;University of Texas at Arlington, Arlington, TX, USA;George Washington University, Washington, D.C, USA;University of Texas at Arlington, Arlington, TX, USA

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.