Compiler supported high-level abstractions for sparse disk-resident datasets

  • Authors:
  • Renato Ferreira;Gagan Agrawal;Joel Saltz

  • Affiliations:
  • Ohio State University, Columbus OH;Ohio State University, Columbus OH;Ohio State University, Columbus OH

  • Venue:
  • ICS '02 Proceedings of the 16th international conference on Supercomputing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone.We propose use of high-level abstractions for hiding the irregularities in these datasets and enabling rapid development of correct data processing applications. We present two execution strategies and a set of compiler analysis techniques for obtaining high performance from applications written using our proposed high-level abstractions. Our execution strategies achieve high locality in disk accesses. Once a disk block is read from the disk, all iterations that access any of the elements from this disk block are performed. To support our execution strategies and improve the performance, we have developed static analysis techniques for: 1) computing the set of iterations that access a particular right-hand-side element, 2) generating a function that can be applied to the meta-data associated with each disk block, for determining if that disk block needs to be read, and 3) performing code hoisting of conditionals.We present experimental results from a prototype compiler implementing our techniques to demonstrate the effectiveness of our approach.