Chunked extendible dense arrays for scientific data storage

Authors:
E. J. Otoo;Gideon Nimako;Daniel Ohene-Kwofie
Affiliations:
-;-;-
Venue:
Parallel Computing
Year:
2013

Citing 10
Cited 0

Data structures, algorithms, and performance

Data structures, algorithms, and performance
Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
Allocating Storage for Extendible Arrays

Journal of the ACM (JACM)
Hashing Schemes for Extendible Arrays

Journal of the ACM (JACM)
Data Structure Techniques

Data Structure Techniques
Extendible Arrays for Statistical Databases and OLAP Applications

SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Managing storage for extendible arrays (Extended Abstract)

STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
History offset implementation scheme for large scale multidimensional data sets

Proceedings of the 2008 ACM symposium on Applied computing
An empirical evaluation of extendible arrays

SEA'11 Proceedings of the 10th international conference on Experimental algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several meetings of the Extremely Large Databases Community for large scale scientific applications advocate the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases require efficient dynamic storage schemes where the array is allowed to arbitrarily extend the bounds of the dimensions. Conventional multidimensional array representations cannot extend or shrink their bounds without relocating elements of the data-set. In general, extendibility of the bounds of the dimensions, is limited to only one dimension. This paper presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time for an element. This is done with a computed access mapping function, that maps the k-dimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array dimension can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions, in our storage scheme for dense array files, can still be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional arrays-files. We also present theoretical and experimental analysis of our scheme with respect to access time and storage overhead. Such mapping scheme can be readily integrated into existing PGAS models for parallel processing in a cluster networked computing environment.