Improving Access to Multi-dimensional Self-describing Scientific Datasets

  • Authors:
  • Beomseok Nam;Alan Sussman

  • Affiliations:
  • -;-

  • Venue:
  • CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications that query into very large multi-dimensional datasets are becoming more common.Many self-describing scientific data file formats have alsoemerged, which have structural metadata to help navigatethe multi-dimensional arrays that are stored in the files.The files may also contain application-specific semanticmetadata. In this paper, we discuss efficient methodsfor performing searches for subsets of multi-dimensionaldata objects, sing semantic information to build multi-dimensional indexes, and group data items into properlysized chunks to maximize disk I/O bandwidth. This work isthe first step in the design and implementation of a genericindexing library that will work with various high-dimensionscientific data file formats containing semantic informationabout the stored data. To validate the approach, we haveimplemented indexing structures for NASA remote sensingdata stored in the HDF format with a specific schema(HDF-EOS), and show the performance improvements thatare gained from indexing the datasets, compared to usingthe existing HDF library for accessing the data.