Compiler support for efficient processing of XML datasets

Authors:
Xiaogang Li;Renato Ferreira;Gagan Agrawal
Affiliations:
Ohio State University, Columbus, OH;Universidade Federal de Minas Gerais, Brasil;Ohio State University, Columbus, OH
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 21
Cited 3

Efficient compilation of linear recursive functions into object level loops

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Object-oriented type inference

OOPSLA '91 Conference proceedings on Object-oriented programming systems, languages, and applications
A transformation-based approach to optimizing loops in database programming languages

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Separation constraint partitioning: a new algorithm for partitioning non-strict programs into sequential threads

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced Array Optimizations for High Performance Functional Languages

IEEE Transactions on Parallel and Distributed Systems
A MATLAB to Fortran 90 translator and its effectiveness

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The virtual microscope

The virtual microscope
Extensible markup language

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
A case for source-level transformations in MATLAB

Proceedings of the 2nd conference on Domain-specific languages
Compiling object-oriented data intensive applications

Proceedings of the 14th international conference on Supercomputing
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Compiler supported high-level abstractions for sparse disk-resident datasets

ICS '02 Proceedings of the 16th international conference on Supercomputing
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
XBench - A Family of Benchmarks for XML DBMSs

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
A comprehensive XQuery to SQL translation using dynamic interval encoding

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The gSOAP Toolkit for Web Services and Peer-to-Peer Computing Networks

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
High Performance Fortran: Language Specification (PART II)

ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
Structural function inlining technique for structurally recursive XML queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Efficient evaluation of XQuery over streaming data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data Centric Transformations on Non-Integer Iteration Spaces

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Supporting XML based high-level abstractions on HDF5 datasets: a case study in automatic data virtualization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets.Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of generalized reduction functions, which particularly requires the synthesis of global reduction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level functionality.This paper describes our solutions towards these problems. By implementing the techniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident datasets and parallelization on a cluster of machines. Our experimental results show that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2) parallel versions achieve good load-balance, and incur no significant overheads besides communication.