Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Machine Learning
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Query Expansion by Mining User Logs
IEEE Transactions on Knowledge and Data Engineering
QC-trees: an efficient summary structure for semantic OLAP
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An introduction to variable and feature selection
The Journal of Machine Learning Research
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
OLAP over uncertain and imprecise data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Personalized query expansion for the web
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Quotient cube: how to summarize the semantics of a data cube
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
High-dimensional OLAP: a minimal cubing approach
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The polynomial complexity of fully materialized coalesced cubes
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
OLAP over imprecise data with domain constraints
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
OLAP-based query recommendation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs
Information Sciences: an International Journal
Latent OLAP: data cubes over latent variables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are gathered from a sample of roughly 5,000 households. To use the results effectively, the samples are further partitioned in a multidimensional space based on multiple attribute values. This naturally leads to the desirability of OLAP (Online Analytical Processing) over sampling data. However, unlike traditional data, sampling data is inherently uncertain, i.e., not representing the full data in the population. Thus, it is desirable to return not only query results but also the confidence intervals indicating the reliability of the results. Moreover, a certain segment in a multidimensional space may contain none or too few samples. This requires some additional analysis to return trustable results. In this paper we propose a Sampling Cube framework, which efficiently calculates confidence intervals for any multidimensional query and uses the OLAP structure to group similar segments to increase sampling size when needed. Further, to handle high dimensional data, a Sampling Cube Shell method is proposed to effectively reduce the storage requirement while still preserving query result quality.