Cluster computing, recursion and datalog

Authors:
Foto N. Afrati;Vinayak Borkar;Michael Carey;Neoklis Polyzotis;Jeffrey D. Ullman
Affiliations:
National Technical University of Athens, Greece;UC Irvine;UC Irvine;UC Santa Cruz;Stanford University
Venue:
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Year:
2010

Citing 20
Cited 1

Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
Performance evaluation of algorithms for transitive closure

Information Systems
A performance study of transitive closure algorithms

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
On the Computation of the Transitive Closure of Relational Operators

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
stdchk: A Checkpoint Storage System for Desktop Grid Computing

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Database Systems: The Complete Book

Database Systems: The Complete Book
Clustera: an integrated computation and data management system

Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment

Proceedings of the 13th International Conference on Extending Database Technology
Boom analytics: exploring data-centric, declarative programming for the cloud

Proceedings of the 5th European conference on Computer systems
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Mining of Massive Datasets

Mining of Massive Datasets

Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cluster-computing environment typified by Hadoop, the open-source implementation of map-reduce, is receiving serious attention as the way to execute queries and other operations on very large-scale data. Datalog execution presents several unusual issues for this enviroment. We discuss the best way to execute a round of seminaive evaluation on a computing cluster using the map-reduce. Using transitive closure as an example, we examine the cost of executing recursions in several different ways. Recursive processes such as evaluation of a recursive Datalog program do not fit the key map-reduce assumption that tasks deliver output only when they are completed. As a result, the resilience under compute-node failure that is a key element of the map-reduce framework is not supported for recursive programs. We discuss extensions to this framework that are suitable for executing recursive Datalog programs on very large-scale data in a way that allows progress to continue after node failures, without restarting the entire job.