Query sampling in DB2 Universal Database

  • Authors:
  • Jarek Gryz;Junjie Guo;Linqi Liu;Calisto Zuzarte

  • Affiliations:
  • York University;York University;IBM Toronto Lab;IBM Toronto Lab

  • Venue:
  • SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Executing ad hoc queries against large databases can be prohibitively expensive. Exploratory analysis of data may not require exact answers to queries, however: results based on sampling the data are often satisfactory. Supporting sampling as a primitive SQL operator turns out to be difficult because sampling does not commute with many SQL operators.In this paper, we describe an implementation in IBM® DB2® Universal Database (UDB) of a sampling operator that commutes with some SQL operators. As a result, the query with the sampling operator always returns a random sample of the answers and in many cases runs faster than it would have without such an operator.