Datalog redux: experience and conjecture

Authors:
Joseph M. Hellerstein
Affiliations:
UC Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2010

Citing 0
Cited 10

The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology
Relational transducers for declarative networking

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
D2R2: disk-oriented deductive reasoning in a RISC-style RDF engine

RuleML'11 Proceedings of the 5th international conference on Rule-based modeling and computing on the semantic web
The ERC webdam on foundations of web data management

Proceedings of the 21st international conference companion on World Wide Web
Deciding eventual consistency for a simple class of relational transducer networks

Proceedings of the 15th International Conference on Database Theory
On the CRON conjecture

Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
Relational transducers for declarative networking

Journal of the ACM (JACM)
Extending the power of datalog recursion

The VLDB Journal — The International Journal on Very Large Data Bases
Graph queries in a next-generation Datalog system

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is growing urgency in computer science circles regarding an impending crisis in parallel programming. Emerging computing platforms, from multicore processors to cloud computing, predicate their performance growth on the development of software to harness parallelism. For the first time in the history of computing, the progress of Moore's Law depends on the productivity of software engineers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and simply unworkable for the majority. There has never been a more urgent need for breakthroughs in programming models and languages. While parallel programming in general is considered very difficult, data parallelism has been very successful. The relational algebra parallelizes easily over large datasets, and SQL programmers have long reaped the benefits of parallelism without modifications to their code. This point has been rediscovered and amplified via recent enthusiasm for MapReduce programming and "Big Data", which have turned data parallelism into common culture across computing. As a result, it is increasingly attractive to tackle the challenge of parallel programming on the firm common ground of data parallelism: start with an easy-to-parallelize kernel-relational algebra-and extend it to general-purpose computation. This approach has clear precedents in database theory, where it has long been known that classical relational languages have natural Turing-complete extensions. At the same time that this crisis has been evolving, variants of Datalog have been seen cropping up in a wide range of practical settings, from security to robotics to compiler analysis. Over the past seven years, we have been exploring the use of Datalog-inspired languages in a variety of systems projects, with a focus on inherently parallel tasks in networking and distributed systems. The experience has been largely positive: we have demonstrated full-featured Datalog-based system implementations that are orders of magnitude more compact than equivalent imperatively-implemented systems, with competitive performance and significantly accelerated software evolution. Evidence is mounting that Datalog can serve as the basis of a much simpler family of languages for programming serious parallel and distributed software. This raises many questions that should warm the heart of a database theoretician. How does the complexity hierarchy of logic languages relate to parallel models of computation? Is there a suitable Coordination Complexity model that captures the realities of modern parallel hardware, where computation is cheap and coordination is expensive? Can the lens of logic provide better focus on what is "hard" to parallelize, what is "embarrassingly parallel", and points in between? Does our understanding of non-monotonic reasoning shed light on the ability of loosely-coupled distributed systems to guarantee eventual consistency? And finally, a question close to the heart of the PODS conference: if Datalog has been The Answer all these years, is parallel and distributed programming The Question it has been waiting for? In this talk and the paper that accompanies it, I present design patterns that arose in our experience building distributed and parallel software in the style of Datalog, and use them to motivate some initial conjectures relating to the questions above. The full paper was not available at the time these proceedings were printed, but can be found online by searching for the phrase "Springtime for Datalog".