User-cognizant multidimensional analysis

Authors:
Sunita Sarawagi
Affiliations:
Indian Institute of Technology, Bombay, India/ E-mail: sunita@it.iitb.ac.in
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2001

Citing 9
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Prediction with local patterns using cross-entropy

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
i3: intelligent, interactive investigation of OLAP data cubes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Explaining Differences in Multidimensional Aggregates

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our goal is to enhance multidimensional database systems with a suite of advanced operators to automate data analysis tasks that are currently handled through manual exploration. In this paper, we present a key component of our system that characterizes the information content of a cell based on a user's prior familiarity with the cube and provides a context-sensitive exploration of the cube. There are three main modules of this component. A Tracker, that continuously tracks the parts of the cube that a user has visited. A Modeler, that pieces together the information in the visited parts to model the user's expected values in the unvisited parts. An Informer, that processes user's queries about the most informative unvisited parts of the cube. The mathematical basis for the expected value modeling is provided by the classical maximum entropy principle. Accordingly, the expected values are computed so as to agree with every value that is already visited while reducing assumptions about unvisited values to the minimum by maximizing their entropy. The most informative values are defined as those that bring the new expected values closest to the actual values. We believe and prove through experiments that such a user-in-the-loop exploration will enable much faster assimilation of all significant information in the data compared to existing manual explorations.