Using Datacube Aggregates for Approximate Querying and Deviation Detection

Authors:
Themis Palpanas;Nick Koudas;Alberto Mendelzon
Affiliations:
-;IEEE Computer Society;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 29
Cited 15

Elements of information theory

Elements of information theory
A universal-scheme approach to statistical databases containing homogeneous summary tables

ACM Transactions on Database Systems (TODS)
View maintenance issues for the chronicle data model (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing

Computational Linguistics
Quasi-cubes: exploiting approximations in multidimensional databases

ACM SIGMOD Record
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Snakes and sandwiches: optimal clustering strategies for a data warehouse

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction with local patterns using cross-entropy

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Answering complex SQL queries using automatic summary tables

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compressing SQL workloads

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate Query Processing with Summary Tables in Statistical Databases

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
SMART: making DB2 (more) autonomic

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Regression Cubes with Lossless Compression and Aggregation

IEEE Transactions on Knowledge and Data Engineering
A probabilistic model for data cube compression and query approximation

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Knowledge Mining for the Business Analyst

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Built-In Indicators to Discover Interesting Drill Paths in a Cube

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Improving estimation accuracy of aggregate queries on data cubes

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
An intelligent questionnaire analysis expert system

Expert Systems with Applications: An International Journal
What Can Formal Concept Analysis Do for Data Warehouses?

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
View Discovery in OLAP Databases through Statistical Combinatorial Optimization

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Embedded indicators to facilitate the exploration of a data cube

International Journal of Business Intelligence and Data Mining
Improving estimation accuracy of aggregate queries on data cubes

Data & Knowledge Engineering
Measure-driven keyword-query expansion

Proceedings of the VLDB Endowment
A knowledge mining framework for business analysts

ACM SIGMIS Database
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Approximate answers to OLAP queries on streaming data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
A neural-based approach for extending OLAP to prediction

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much research has been devoted to the efficient computation of relational aggregations and, specifically, the efficient execution of the datacube operation. In this paper, we consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We then show how approximate queries on the data from which the aggregates were derived can be performed using our framework. We also describe an alternate use of the proposed framework that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. We present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions. Finally, we evaluate our techniques with a case study on a real data set, which illustrates the applicability of our approach.