Advanced SQL modeling in RDBMS

  • Authors:
  • Andrew Witkowski;Srikanth Bellamkonda;Tolga Bozkaya;Nathan Folkert;Abhinav Gupta;John Haydu;Lei Sheng;Sankar Subramanian

  • Affiliations:
  • Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA

  • Venue:
  • ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Commercial relational database systems lack support for complex business modeling. ANSI SQL cannot treat relations as multidimensional arrays and define multiple, interrelated formulas over them, operations which are needed for business modeling. Relational OLAP (ROLAP) applications have to perform such tasks using joins, SQL Window Functions, complex CASE expressions, and the GROUP BY operator simulating the pivot operation. The designated place in SQL for calculations is the SELECT clause, which is extremely limiting and forces the user to generate queries with nested views, subqueries and complex joins. Furthermore, SQL query optimizers are preoccupied with determining efficient join orders and choosing optimal access methods and largely disregard optimization of multiple, interrelated formulas. Research into execution methods has thus far concentrated on efficient computation of data cubes and cube compression rather than on access structures for random, interrow calculations. This has created a gap that has been filled by spreadsheets and specialized MOLAP engines, which are good at specification of formulas for modeling but lack the formalism of the relational model, are difficult to coordinate across large user groups, exhibit scalability problems, and require replication of data between the tool and RDBMS. This article presents an SQL extension called SQL Spreadsheet, to provide array calculations over relations for complex modeling. We present optimizations, access structures, and execution models for processing them efficiently. Special attention is paid to compile time optimization for expensive operations like aggregation. Furthermore, ANSI SQL does not provide a good separation between data and computation and hence cannot support parameterization for SQL Spreadsheets models. We propose two parameterization methods for SQL. One parameterizes ANSI SQL view using subqueries and scalars, which allows passing data to SQL Spreadsheet. Another method presents parameterization of the SQL Spreadsheet formulas. This supports building stand-alone SQL Spreadsheet libraries. These models are then subject to the SQL Spreadsheet optimizations during model invocation time.