Latent OLAP: data cubes over latent variables

Authors:
Deepak Agarwal;Bee-Chung Chen
Affiliations:
Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 18
Cited 2

Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Loglinear-Based Quasi Cubes

Journal of Intelligent Information Systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Prediction cubes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing mpf queries: decision support and probabilistic inference

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Estimating rates of rare events at multiple resolutions

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Explore/Exploit Schemes for Web Content Optimization

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Estimating rates of rare events with multiple hierarchies through scalable log-linear models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

A neural-based approach for extending OLAP to prediction

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Discovering diverse association rules from multidimensional schema

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a novel class of data cube, called latent-variable cube. For many data analysis tasks, data in a database can be represented as points in a multi-dimensional space. Ordinary data cubes compute aggregate functions over these "observed" data points for each cell (i.e., region) in the space, where the cells have different granularities defined by hierarchies. While useful, data cubes do not provide sufficient capability for analyzing "latent variables" that are often of interest but not directly observed in data. For example, when analyzing users' interaction with online advertisements, observed data informs whether a user clicked an ad or not. However, the real interest is often in knowing the click probabilities of ads for different user populations. In this example, click probabilities are latent variables that are not observed but have to be estimated from data. We argue that latent variables are a useful construct for a number of OLAP application scenarios. To facilitate such analyses, we propose cubes that compute aggregate functions over latent variables. Specifically, we discuss the pitfalls of common practice in scenarios where latent variables should, but are not considered; we rigorously define latent-variable cube based on Bayesian hierarchical models and provide efficient algorithms. Through extensive experiments on both simulated and real data, we show that our method is accurate and runs orders of magnitude faster than the baseline.