XArch: archiving scientific and reference data

Authors:
Heiko Müller;Peter Buneman;Ioannis Koltsidas
Affiliations:
University of Edinburgh, Edinburgh, United Kngdm;University of Edinburgh, Edinburgh, United Kngdm;University of Edinburgh, Edinburgh, United Kngdm
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 5
Cited 6

The object database standard: ODMG 2.0

The object database standard: ODMG 2.0
Keys for XML

Proceedings of the 10th international conference on World Wide Web
Archiving scientific data

ACM Transactions on Database Systems (TODS)
NEXSORT: Sorting XML in External Memory

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
DTD-directed publishing with attribute translation grammars

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Curated databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sorting hierarchical data in external memory for archiving

Proceedings of the VLDB Endowment
DBWiki: a structured wiki for curated data and collaborative data management

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient storage and temporal query evaluation in hierarchical data archiving systems

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
The database Wiki project: a general-purpose platform for data curation and collaboration

ACM SIGMOD Record
Provenance in streamflow forecasting

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database archiving is important for the retrieval of old versions of a database and for temporal queries over the history of data. We demonstrate XArch, a management system for maintaining, populating, and querying archives of hierarchical data. XArch is based on a nested merge approach that efficiently stores multiple versions of hierarchical data in a compact archive. By merging elements into one data structure, any specific version is retrievable from the archive in a single pass over the data and efficient tracking of object history is possible. XArch implements this approach and extends it in two important ways. First, in order to merge large hierarchical data sets, elements need to be sorted according to their key values. We developed an efficient algorithm for sorting hierarchical data in secondary storage and modified the nested merge algorithm accordingly. Second, we designed and implemented a declarative query language that enables one both to view data from particular versions and to track the history of objects. We demonstrate this using both molecular biology and demographic reference data as examples.