Copy-Based versus Edit-Based Version Management Schemes for Structured Documents

  • Authors:
  • Shu-Yao Chien;Carlo Zaniolo;Vassilis J. Tsotras

  • Affiliations:
  • -;-;-

  • Venue:
  • RIDE '01 Proceedings of the 11th International Workshop on research Issues in Data Engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Managing multiple versions of XML documents and semistructured data represents a problem of growing interest. Traditional version control methods, such as RCS, use edit scripts representing changes in the document to support the incremental reconstruction of different versions. The edit-based approaches have been recently enhanced with a replication scheme called UBCC [2]. UBCC is based on the notion of page usefulness and ensures effective management for multi-version documents in terms of both retrieval and storage cost. These improvements notwithstanding, the edit-based representation suffers from limited generality and flexibility--e.g., it cannot represent changes such as rearranging the document or duplicating parts of its content. To solve these problems, the paper proposes a copy-based UBCC versioning scheme, which also provides a simpler format for the electronic exchange of multi-version documents. With the objective of matching the performance of the edit-based UBCC technique, we develop algorithms that enhance the copy-based UBCC scheme with page usefulness management. We also present results of various experiments that test the storage and retrieval performance of the new copy-based approach, and compare it with that of the edit-based UBCC approach.