Minimal data sets vs. synchronized data copies in a schema and data versioning system

  • Authors:
  • Bob Wall;Rafal Angryk

  • Affiliations:
  • Montana State University, Bozeman, MT, USA;Montana State University, Bozeman, MT, USA

  • Venue:
  • Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a key component of our proposed data-base schema and data versioning system, ScaDaVer. The versioning system is based on common practices used to manage source code changes in software development. It allows users of a data-base to create branches in which changes to the database are isolated from the main database and from other sandboxes. Schema and data versioning techniques are used to isolate changes made within the branches. There are two different approaches we are investigating to handle the schema and data versioning; the first is to store the minimal set of changes from the base schema and data for each branch, and to map queries in the branch back to the primary database to retrieve most data. These query results would be merged with the results from the branch data. The second is to create a copy of each table modified in a branch and map any updates to the primary database table into the branch. We are investigating the qualitative and quantitative differences between these two techniques given different usage patterns, and for the query mapping technique, we are working to prove the correctness of the mapped queries. This is done by expressing queries using multi-relational algebra and showing equivalence of the mapped queries to the same queries against a database without versioning.