Data densification in a relational database system

Authors:
Abhinav Gupta;Sankar Subramanian;Srikanth Bellamkonda;Tolga Bozkaya;Nathan Folkert;Lei Sheng;Andrew Witkowski
Affiliations:
Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA
Venue:
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Year:
2004

Citing 3
Cited 1

Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Query Optimization by Predicate Move-Around

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Spreadsheets in RDBMS for OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Continuous queries in oracle

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data in a relational data warehouse is usually sparse. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. Densities of 0.1-2% are very common. However, users may want to view the data in a dense form, with rows for all combination of dimension values displayed even when no fact data exists for them. For example, if a product did not sell during a particular time period, users may still want to see the product for that time period with zero sales value next to it. Moreover, analytic window functions [1] and the SQL model clause [2] can more easily express time series calculations if data is dense along the time dimension because dense data will fill a consistent number of rows for each period.Data densification is the process of converting spare data into dense form. The current SQL technique for densification (using the combination of DISTINCT, CROSS JOIN and OUTER JOIN operations) is extremely unintuitive, difficult to express and inefficient to compute. Hence, we propose an extension to the ANSI SQL join operator, referred to as "PARTITIONED OUTER JOIN", which allows for a succinct expression of densification along the dimensions of interest. We also present various algorithms to evaluate the new join operator efficiently and compare it with existing methods of doing the equivalent operation. We also define a new window function "LAST_VALUE (IGNORE NULLS)" which is very useful with partitioned outer join.