Issues in parallel execution of non-monotonic reasoning systems
Parallel Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
A parallel ASP instantiator based on DLV
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
vSPARQL: A view definition language for the semantic web
Journal of Biomedical Informatics
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Dyna: extending datalog for modern AI
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Recent advances in declarative networking
PADL'12 Proceedings of the 14th international conference on Practical Aspects of Declarative Languages
Hi-index | 0.00 |
We explore the design and implementation of a scalable Datalog system using Hadoop as the underlying runtime system. Observing that several successful projects provide a relational algebra-based programming interface to Hadoop, we argue that a natural extension is to add recursion to support scalable social network analysis, internet traffic analysis, and general graph query. We implement semi-naive evaluation in Hadoop, then apply a series of optimizations spanning fundamental changes to the Hadoop infrastructure to basic configuration guidelines that collectively offer a 10x improvement in our experiments. This work lays the foundation for a more comprehensive cost-based algebraic optimization framework for parallel recursive Datalog queries.