Using Loglinear Models to Compress Datacube

  • Authors:
  • Daniel Barbará;Xintao Wu

  • Affiliations:
  • -;-

  • Venue:
  • WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage, so, compressing them is of practical importance. In this paper, we propose an approximation technique that reduces the storage cost of the cube at the price of getting approximate answers for the queries posedagain st the cube. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, and to guarantee the level of error in the query answers, some of the "outliers" (i.e., cells that incur in the largest errors when estimated), are retained. The storage taken by the model parameters and the retained cells, of course, shouldt ake a fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations. We use loglinear models to model the cube regions. Experiments show that the errors introduced in typical queries are small even when the description is substantially smaller than the full cube. The models also offer information about the underlying structure of the data modeled by them. Moreover, these models are relatively easy to update dynamically as data is added to the warehouse.