Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Journal of Intelligent Information Systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Discovery-Driven Exploration of OLAP Data Cubes
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
VLDB '05 Proceedings of the 31st international conference on Very large data bases
OLAP over uncertain and imprecise data
The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing mpf queries: decision support and probabilistic inference
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Estimating rates of rare events at multiple resolutions
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Multi-dimensional regression analysis of time-series data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sampling cube: a framework for statistical olap over sampling data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
A Survey of Uncertain Data Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
Explore/Exploit Schemes for Web Content Optimization
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Estimating rates of rare events with multiple hierarchies through scalable log-linear models
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A neural-based approach for extending OLAP to prediction
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Discovering diverse association rules from multidimensional schema
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
We introduce a novel class of data cube, called latent-variable cube. For many data analysis tasks, data in a database can be represented as points in a multi-dimensional space. Ordinary data cubes compute aggregate functions over these "observed" data points for each cell (i.e., region) in the space, where the cells have different granularities defined by hierarchies. While useful, data cubes do not provide sufficient capability for analyzing "latent variables" that are often of interest but not directly observed in data. For example, when analyzing users' interaction with online advertisements, observed data informs whether a user clicked an ad or not. However, the real interest is often in knowing the click probabilities of ads for different user populations. In this example, click probabilities are latent variables that are not observed but have to be estimated from data. We argue that latent variables are a useful construct for a number of OLAP application scenarios. To facilitate such analyses, we propose cubes that compute aggregate functions over latent variables. Specifically, we discuss the pitfalls of common practice in scenarios where latent variables should, but are not considered; we rigorously define latent-variable cube based on Bayesian hierarchical models and provide efficient algorithms. Through extensive experiments on both simulated and real data, we show that our method is accurate and runs orders of magnitude faster than the baseline.