ACM Transactions on Computer Systems (TOCS)
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Operating system profiling via latency analysis
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Sailfish: a framework for large scale data processing
Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Using program analysis to reduce misconfiguration in open source systems software
Using program analysis to reduce misconfiguration in open source systems software
Apache Hadoop YARN: yet another resource negotiator
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
In this demo proposal, we describe REEF, a framework that makes it easy to implement scalable, fault-tolerant runtime environments for a range of computational models. We will demonstrate diverse workloads, including extract-transform-load MapReduce jobs, iterative machine learning algorithms, and ad-hoc declarative query processing. At its core, REEF builds atop YARN (Apache Hadoop 2's resource manager) to provide retainable hardware resources with lifetimes that are decoupled from those of computational tasks. This allows us to build persistent (cross-job) caches and cluster-wide services, but, more importantly, supports high-performance iterative graph processing and machine learning algorithms. Unlike existing systems, REEF aims for composability of jobs across computational models, providing significant performance and usability gains, even with legacy code. REEF includes a library of interoperable data management primitives optimized for communication and data movement (which are distinct from storage locality). The library also allows REEF applications to access external services, such as user-facing relational databases. We were careful to decouple lower levels of REEF from the data models and semantics of systems built atop it. The result was two new standalone systems: Tang, a configuration manager and dependency injector, and Wake, a state-of-the-art event-driven programming and data movement framework. Both are language independent, allowing REEF to bridge the JVM and .NET.