iDiff: Informative Summarization of Differences in Multidimensional Aggregates

Authors:
Sunita Sarawagi
Affiliations:
Indian Institute of Technology, Bombay, India. sunita@it.iitb.ernet.in
Venue:
Data Mining and Knowledge Discovery
Year:
2001

Citing 7
Cited 6

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Elements of information theory

Elements of information theory
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
i3: intelligent, interactive investigation of OLAP data cubes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Explaining Differences in Multidimensional Aggregates

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

A new OLAP aggregation based on the AHC technique

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
View Discovery in OLAP Databases through Statistical Combinatorial Optimization

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
ClustCube: an OLAP-based framework for clustering and mining complex database objects

Proceedings of the 2011 ACM Symposium on Applied Computing
OLAP over continuous domains via density-based hierarchical clustering

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
A data mining approach to knowledge discovery from multidimensional cube structures

Knowledge-Based Systems
Adtributor: revenue debugging in advertising systems

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level.We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.