DAWN: an efficient framework of DCT for data with error estimation

  • Authors:
  • Ming-Jyh Hsieh;Wei-Guang Teng;Ming-Syan Chen;Philip S. Yu

  • Affiliations:
  • Electrical Engineering Department, National Taiwan University, Taipei, Taiwan, ROC;Electrical Engineering Department, National Taiwan University, Taipei, Taiwan, ROC and Department of Engineering Science, National Cheng Kung University, Tainen city, Taiwan, ROC 701;Electrical Engineering Department, National Taiwan University, Taipei, Taiwan, ROC;IBM Thomas J. Watson Research Centre, Yorktown, USA 10598

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

On-line analytical processing (OLAP) has become an important component in most data warehouse systems and decision support systems in recent years. In order to deal with the huge amount of data, highly complex queries and increasingly strict response time requirements, approximate query processing has been deemed a viable solution. Most works in this area, however, focus on the space efficiency and are unable to provide quality-guaranteed answers to queries. To remedy this, in this paper, we propose an efficient framework of DCT for dAta With error estimatioN, called DAWN, which focuses on answering range-sum queries from compressed OP-cubes transformed by DCT. Specifically, utilizing the techniques of Geometric series and Euler's formula, we devise a robust summation function, called the GE function, to answer range queries in constant time, regardless of the number of data cells involved. Note that the GE function can estimate the summation of cosine functions precisely; thus the quality of the answers is superior to that of previous works. Furthermore, an estimator of errors based on the Brown noise assumption (BNA) is devised to provide tight bounds for answering range-sum queries. Our experiment results show that the DAWN framework is scalable to the selectivity of queries and the available storage space. With GE functions and the BNA method, the DAWN framework not only delivers high quality answers for range-sum queries, but also leads to shorter query response time due to its effectiveness in error estimation.