Minimal data sets vs. synchronized data copies in a schema and data versioning system

Authors:
Bob Wall;Rafal Angryk
Affiliations:
Montana State University, Bozeman, MT, USA;Montana State University, Bozeman, MT, USA
Venue:
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Year:
2011

Citing 8
Cited 2

Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries

IEEE Transactions on Software Engineering
Correctness of query execution strategies in distributed databases

ACM Transactions on Database Systems (TODS)
An extended relational algebra with control over duplicate elimination

PODS '82 Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems
An asymptotically optimal multiversion B-tree

The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Concurrency control and recovery for multiversion database structures

Proceedings of the 2nd PhD workshop on Information and knowledge management
Transactions on the multiversion B+-tree

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
CODS: evolving data efficiently and scalably in column oriented databases

Proceedings of the VLDB Endowment

PIKM 2011: the 4th ACM workshop for Ph.D. students in information and knowledge management

Proceedings of the 20th ACM international conference on Information and knowledge management
Load balance for semantic cluster-based data integration systems

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a key component of our proposed data-base schema and data versioning system, ScaDaVer. The versioning system is based on common practices used to manage source code changes in software development. It allows users of a data-base to create branches in which changes to the database are isolated from the main database and from other sandboxes. Schema and data versioning techniques are used to isolate changes made within the branches. There are two different approaches we are investigating to handle the schema and data versioning; the first is to store the minimal set of changes from the base schema and data for each branch, and to map queries in the branch back to the primary database to retrieve most data. These query results would be merged with the results from the branch data. The second is to create a copy of each table modified in a branch and map any updates to the primary database table into the branch. We are investigating the qualitative and quantitative differences between these two techniques given different usage patterns, and for the query mapping technique, we are working to prove the correctness of the mapped queries. This is done by expressing queries using multi-relational algebra and showing equivalence of the mapped queries to the same queries against a database without versioning.