Optimizing large-scale Semi-Naïve datalog evaluation in hadoop

  • Authors:
  • Marianne Shaw;Paraschos Koutris;Bill Howe;Dan Suciu

  • Affiliations:
  • University of Washington;University of Washington;University of Washington;University of Washington

  • Venue:
  • Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the design and implementation of a scalable Datalog system using Hadoop as the underlying runtime system. Observing that several successful projects provide a relational algebra-based programming interface to Hadoop, we argue that a natural extension is to add recursion to support scalable social network analysis, internet traffic analysis, and general graph query. We implement semi-naive evaluation in Hadoop, then apply a series of optimizations spanning fundamental changes to the Hadoop infrastructure to basic configuration guidelines that collectively offer a 10x improvement in our experiments. This work lays the foundation for a more comprehensive cost-based algebraic optimization framework for parallel recursive Datalog queries.