Cluster computing, recursion and datalog

  • Authors:
  • Foto N. Afrati;Vinayak Borkar;Michael Carey;Neoklis Polyzotis;Jeffrey D. Ullman

  • Affiliations:
  • National Technical University of Athens, Greece;UC Irvine;UC Irvine;UC Santa Cruz;Stanford University

  • Venue:
  • Datalog'10 Proceedings of the First international conference on Datalog Reloaded
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The cluster-computing environment typified by Hadoop, the open-source implementation of map-reduce, is receiving serious attention as the way to execute queries and other operations on very large-scale data. Datalog execution presents several unusual issues for this enviroment. We discuss the best way to execute a round of seminaive evaluation on a computing cluster using the map-reduce. Using transitive closure as an example, we examine the cost of executing recursions in several different ways. Recursive processes such as evaluation of a recursive Datalog program do not fit the key map-reduce assumption that tasks deliver output only when they are completed. As a result, the resilience under compute-node failure that is a key element of the map-reduce framework is not supported for recursive programs. We discuss extensions to this framework that are suitable for executing recursive Datalog programs on very large-scale data in a way that allows progress to continue after node failures, without restarting the entire job.