Multiversion join index for multiversion data warehouse

  • Authors:
  • Jan Chmiel;Tadeusz Morzy;Robert Wrembel

  • Affiliations:
  • Poznań University of Technology, Institute of Computing Science, Piotrowo 2, 60-965 Poznań, Poland;Poznań University of Technology, Institute of Computing Science, Piotrowo 2, 60-965 Poznań, Poland;Poznań University of Technology, Institute of Computing Science, Piotrowo 2, 60-965 Poznań, Poland

  • Venue:
  • Information and Software Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The data warehouse (DW) technology is developed in order to support the integration of external data sources (EDSs) for the purpose of advanced data analysis by On-Line Analytical Processing (OLAP) applications. Since contents and structures of integrated EDSs may evolve in time, the content and schema of a DW must evolve too in order to correctly reflect the evolution of EDSs. In order to manage a DW evolution, we developed the multiversion data warehouse (MVDW) approach. In this approach, different states of a DW are represented by the sequence of persistent DW versions that correspond either to the real world state or to a simulation scenario. Typically, OLAP applications execute star queries that join multiple fact and dimension tables. An important optimization technique for this kind of queries is based on join indexes. Since in the MVDW fact and dimension data are physically distributed among multiple DW versions, standard join indexes need extensions. In this paper we present the concept of a multiversion join index (MVJI) applicable to indexing dimension and fact tables in the MVDW. The MVJI has a two-level structure, where an upper level is used for indexing attributes and a lower level is used for indexing DW versions. The paper also presents the theoretical upper bound (pessimistic) analysis of the MVJI performance characteristic with respect to I/O operations. The analysis is followed by experimental evaluation. It shows that the MVJI increases a system performance for queries addressing multiple DW versions with exact match and range predicates.