Improving online aggregation performance for skewed data distribution

  • Authors:
  • Yuxiang Wang;Junzhou Luo;Aibo Song;Jiahui Jin;Fang Dong

  • Affiliations:
  • School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China;School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China;School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China;School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China;School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China

  • Venue:
  • DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Online aggregation is a commonly-used technique to response aggregation queries with the refined approximate answers (within an estimated confidence interval) quickly. However, we observe that low selectivity and inappropriate sample proportion significantly affect the online aggregation performance when the data distribution is skewed. To overcome this problem, we propose a Partition-based Online Aggregation System called POAS. In POAS, the side effect of low selectivity can be reduced by efficient pruning of unneeded data due to the partition and shuffle strategies, and the appropriate sample proportion can be achieved as far as possible by drawing samples (tuples) from relevant partitions with dynamic sample size. Moreover, POAS applies some statistical approaches to calculate estimates from relevant partitions. We have implemented POAS and conducted an extensive experiments study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of POAS.