Representing mapreduce optimisations in the nested relational calculus

  • Authors:
  • Marek Grabowski;Jan Hidders;Jacek Sroka

  • Affiliations:
  • Institute of Informatics, University of Warsaw, Poland;Delft University of Technology, The Netherlands;Institute of Informatics, University of Warsaw, Poland

  • Venue:
  • BNCOD'13 Proceedings of the 29th British National conference on Big Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MapReduce programming model is recently getting a lot of attention from both academic and business researchers. Systems based on this model hide communication and synchronization issues from the user and allow processing of high volumes of data on thousands of commodity computers. In this paper we are interested in applying MR to processing hierarchical data with nested collections such as stored in JSON or XML formats but with restricted nesting depth as is usual in the nested relational model. The current data analytics systems now often propose ad-hoc formalisms to represent query evaluation plans and to optimize their execution. In this paper we will argue that the Nested Relation Calculus provides a general, elegant and effective way to describe and investigate these optimizations. It allows to describe and combine both classical optimizations and MapReduce-specific optimizations. We demonstrate this by showing that MapReduce programs can be expressed and represented straightforwardly in NRC by adding syntactic short-hands. In addition we show that optimizations in existing systems can be readily represented in this extended formalism.