Proceedings of the eighth international conference on Information and knowledge management
Generalized MD-Joins: Evaluation and Reduction to SQL
DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Handbook of massive data sets
Decision support queries on a tape-resident data warehouse
Information Systems
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Using grouping variables to express complex decision support queries
Data & Knowledge Engineering
XQuery as a retrieval mechanism for longitudinal multiscale forest resource data
Environmental Modelling & Software
θ-Constrained multi-dimensional aggregation
Information Systems
Supporting real-time supply chain decisions based on RFID data streams
Journal of Systems and Software
Tagged mapreduce: efficiently computing multi-analytics using mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Hi-index | 0.00 |
Large scale data analysis and mining activities, such as identifying interesting trends, making unusual patterns to stand out and verifying hypotheses, require sophisticated information extraction queries. Being able to express these data mining queries concisely is of major importance not only from the user's, but also from the system's point of view. Recent research in OLAP has focused on datacubes and their applications; however, expression and processing of ad hoc decision support queries has been given very little attention. In this paper we present an appropriate framework for these queries and introduce a syntactic construct to support it. This SQL extension allows most OLAP queries, such as complex intra- and inter-group comparisons, trends and hierarchical comparisons, to be expressed in a compact, intuitive and simple manner. However, this syntactic extension is not the focus of this paper. This succinct representation of a complex OLAP query translates immediately to a novel, simple and efficient evaluation algorithm. We show how to optimize, analyze and parallelize this algorithm and discuss issues such as multiple query analysis and scaling. This algorithm constitutes the main contribution of this paper. Finally we introduce our implementation on top of a commercial system and present several experimental results of real-life queries that show orders of magnitude of performance improvement in certain cases. We argue that this tight coupling between representation and algorithm is essential to efficient processing of ad hoc OLAP queries.