Data densification in a relational database system

  • Authors:
  • Abhinav Gupta;Sankar Subramanian;Srikanth Bellamkonda;Tolga Bozkaya;Nathan Folkert;Lei Sheng;Andrew Witkowski

  • Affiliations:
  • Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA;Oracle Corporation, Redwood Shores, CA

  • Venue:
  • SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data in a relational data warehouse is usually sparse. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. Densities of 0.1-2% are very common. However, users may want to view the data in a dense form, with rows for all combination of dimension values displayed even when no fact data exists for them. For example, if a product did not sell during a particular time period, users may still want to see the product for that time period with zero sales value next to it. Moreover, analytic window functions [1] and the SQL model clause [2] can more easily express time series calculations if data is dense along the time dimension because dense data will fill a consistent number of rows for each period.Data densification is the process of converting spare data into dense form. The current SQL technique for densification (using the combination of DISTINCT, CROSS JOIN and OUTER JOIN operations) is extremely unintuitive, difficult to express and inefficient to compute. Hence, we propose an extension to the ANSI SQL join operator, referred to as "PARTITIONED OUTER JOIN", which allows for a succinct expression of densification along the dimensions of interest. We also present various algorithms to evaluate the new join operator efficiently and compare it with existing methods of doing the equivalent operation. We also define a new window function "LAST_VALUE (IGNORE NULLS)" which is very useful with partitioned outer join.