An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment

  • Authors:
  • Wei Sun;Yibei Ling;Naphtali Rishe;Yi Deng

  • Affiliations:
  • School of Computer Science, Florida International University, Miami, Florida;School of Computer Science, Florida International University, Miami, Florida;School of Computer Science, Florida International University, Miami, Florida;School of Computer Science, Florida International University, Miami, Florida

  • Venue:
  • SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.