Supporting XML based high-level abstractions on HDF5 datasets: a case study in automatic data virtualization

  • Authors:
  • Swarup Kumar Sahoo;Gagan Agrawal

  • Affiliations:
  • Department of Computer Science and Engineering, Ohio State University, Columbus, OH;Department of Computer Science and Engineering, Ohio State University, Columbus, OH

  • Venue:
  • LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, we have been focusing on the notion of automatic data virtualization. The goal is to enable automatic creation of efficient data services to support a high-level or virtual view of the data. The application developers express the processing assuming this virtual view, whereas the data is stored in a low-level format. The compiler uses the information about the low-level layout and the relationship between the virtual and the low-level layouts to generate efficient low-level data processing code. In this paper, we describe a specific implementation of this approach. We provide XML-based abstractions on datasets stored in the Hierarchical Data Format (HDF). A high-level XML Schema provides a logical view on the HDF5 dataset, hiding actual layout details. Based on this view, the processing is specified using XQuery, which is the XML Query language developed by the World Wide Web Consortium (W3C). The HDF5 data layout is exposed to the compiler using low-level XML Schema. The relationship between the high-level and low-level Schemas is exposed using a Mapping Schema. We describe how our compiler can generate efficient code to access and process HDF5 datasets using the above information. A number of issues are addressed for ensuring high locality in processing of the datasets, which arise mainly because of the high-level nature of XQuery and because the actual data layout is abstracted.