Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Elements of information theory
Elements of information theory
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
i3: intelligent, interactive investigation of OLAP data cubes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Explaining Differences in Multidimensional Aggregates
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A new OLAP aggregation based on the AHC technique
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
View Discovery in OLAP Databases through Statistical Combinatorial Optimization
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
ClustCube: an OLAP-based framework for clustering and mining complex database objects
Proceedings of the 2011 ACM Symposium on Applied Computing
OLAP over continuous domains via density-based hierarchical clustering
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
A data mining approach to knowledge discovery from multidimensional cube structures
Knowledge-Based Systems
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level.We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.