Compiler support for efficient processing of XML datasets

  • Authors:
  • Xiaogang Li;Renato Ferreira;Gagan Agrawal

  • Affiliations:
  • Ohio State University, Columbus, OH;Universidade Federal de Minas Gerais, Brasil;Ohio State University, Columbus, OH

  • Venue:
  • ICS '03 Proceedings of the 17th annual international conference on Supercomputing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets.Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of generalized reduction functions, which particularly requires the synthesis of global reduction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level functionality.This paper describes our solutions towards these problems. By implementing the techniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident datasets and parallelization on a cluster of machines. Our experimental results show that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2) parallel versions achieve good load-balance, and incur no significant overheads besides communication.