Loglinear-Based Quasi Cubes

Authors:
Daniel Barbará;Xintao Wu
Affiliations:
ISE Department, George Mason University, MSN 4A4, Fairfax, VA 22030, USA;ISE Department, George Mason University, MSN 4A4, Fairfax, VA 22030, USA
Venue:
Journal of Intelligent Information Systems
Year:
2001

Citing 16
Cited 9

Algorithms for clustering data

Algorithms for clustering data
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Quasi-cubes: exploiting approximations in multidimensional databases

ACM SIGMOD Record
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using approximations to scale exploratory data analysis in datacubes

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Modeling and Imputation of Large Incomplete Multidimensional Datasets

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Screening and interpreting multi-item associations based on log-linear modeling

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction cubes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Approximate range---sum query answering on data cubes with probabilistic guarantees

Journal of Intelligent Information Systems
A probabilistic model for data cube compression and query approximation

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
What Can Formal Concept Analysis Do for Data Warehouses?

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Latent OLAP: data cubes over latent variables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage, so, compressing them is of practical importance. In this paper, we propose an approximation technique that reduces the storage cost of the cube at the price of getting approximate answers for the queries posed against the cube. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, and to guarantee the level of error in the query answers, some of the “outliers” (i.e., cells that incur in the largest errors when estimated), are retained. The storage taken by the model parameters and the retained cells, of course, should take a fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations. We use loglinear models to model the cube regions. Experiments show that the errors introduced in typical queries are small even when the description is substantially smaller than the full cube. Since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise. Although other techniques have been used for the purpose of compressing data cubes, ours has the advantage of using parametric (loglinear) models and the retaining of outliers, which enables the system to give error guarantees that are data independent, for every query posed on the data cube. The models also offer information about the underlying structure of the data modeled by them. Moreover, these models are relatively easy to update dynamically as data is added to the warehouse.