Histogram-by: A grouping operator for continuous domains

  • Authors:
  • Seokjin Hong;Jinuk Bae;Taewon Lee;Sukho Lee

  • Affiliations:
  • School of Computer Science and Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Republic of Korea;School of Computer Science and Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Republic of Korea;School of Computer Science and Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Republic of Korea;School of Computer Science and Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Republic of Korea

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new operator, histogram-by, which provides a grouping for continuous domains, which partitions records into several groups by given ranges of the target attributes. The histogram-by operator can be represented as histogram-by clause in the SQL statement, and can be easily amenable to query optimization. As the application of the histogram-by operator, we introduce a multi-dimensional histogram query, which returns aggregate values of all ranges specified by the histogram-by clause. To process the query efficiently, we propose effective algorithms using aggregate R-trees. Our experimental results show that our algorithms are reliable in terms of performance over the synthetic and real-world datasets.