Archiving scientific data

Authors:
Peter Buneman;Sanjeev Khanna;Keishi Tajima;Wang-Chiew Tan
Affiliations:
University of Edinburgh and University of Pennsylvania;University of Pennsylvania;Japan Advanced Institute of Science and Technology;University of Pennsylvania
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 11
Cited 31

Making data structures persistent

Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Fast algorithms for the unit cost editing distance between trees

Journal of Algorithms
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Keys for XML

Proceedings of the 10th international conference on World Wide Web
Database Management Systems

Database Management Systems
Change-Centric Management of Versions in an XML Warehouse

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Management of Multiversion Documents by Object Referencing

Proceedings of the 27th International Conference on Very Large Data Bases
The XML benchmark project

The XML benchmark project

Discovering approximate keys in XML data

Proceedings of the eleventh international conference on Information and knowledge management
The Grid: an application of the semantic web

ACM SIGMOD Record
Towards Collaborative Content Management and Version Control for Structured Mathematical Knowledge

MKM '03 Proceedings of the Second International Conference on Mathematical Knowledge Management
Efficient schemes for managing multiversionXML documents

The VLDB Journal — The International Journal on Very Large Data Bases
NEXSORT: Sorting XML in External Memory

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Data stream management for historical XML data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Deep Store: An Archival Storage System Architecture

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
An XML-Based Approach to Publishing and Querying the History of Databases

World Wide Web
Comparative Analysis of XML Compression Technologies

World Wide Web
Supporting complex queries on multiversion XML documents

ACM Transactions on Internet Technology (TOIT)
Granularity reduction in temporal document databases

Information Systems
Provenance and Annotation for Visual Exploration Systems

IEEE Transactions on Visualization and Computer Graphics
Temporal slicing in the evaluation of XML queries

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Temporal XML: modeling, indexing, and query processing

The VLDB Journal — The International Journal on Very Large Data Bases
A Comparison of XML-Based Temporal Models

Advanced Internet Based Systems and Applications
Why not?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Modeling Concept Evolution: A Historical Perspective

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
An XQuery-based version extension of an XML native database

Proceedings of the 2009 EDBT/ICDT Workshops
Granularity reduction in temporal document databases

Information Systems
Managing scientific data

Communications of the ACM
Constraint preserving XML updating

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
3D_XML: a three-dimensional XML-based model

SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Design, implementation and use of a simulation data archive for coastal science

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Understanding documentation and reconstruction requirements for computer-assisted decision processes

Decision Support Systems
The Foundations for Provenance on the Web

Foundations and Trends in Web Science
PRESIDIO: A Framework for Efficient Archival Data Storage

ACM Transactions on Storage (TOS)
A pattern-based temporal XML query language

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Supporting queries spanning across phases of evolving artifacts using Steiner forests

Proceedings of the 20th ACM international conference on Information and knowledge management
Enabling provenance on large scale e-science applications

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Modeling temporal dimensions of semistructured data

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.