Incremental computation via function caching
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static caching for incremental computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Caching function calls using precise dependencies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Efficient incremental view maintenance in data warehouses
Proceedings of the tenth international conference on Information and knowledge management
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Automated Selection of Materialized Views and Indexes in SQL Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Deriving Production Rules for Incremental View Maintenance
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)
Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)
Software Configuration Management System Using Vesta (Monographs in Computer Science)
Software Configuration Management System Using Vesta (Monographs in Computer Science)
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Comet: batched stream processing for data intensive distributed computing
Proceedings of the 1st ACM symposium on Cloud computing
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
TidyFS: a simple and small distributed file system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
The purge threat: scientists' thoughts on peta-scale usability
Proceedings of the sixth workshop on Parallel Data Storage
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
Large-scale incremental data processing with change propagation
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Putting a "big-data" platform to good use: training kinect
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Streaming big data with self-adjusting computation
DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
DeepSea: self-adaptive data partitioning and replication in scalable distributed data systems
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
(Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Managing data and computation is at the heart of datacenter computing. Manual management of data can lead to data loss, wasteful consumption of storage, and laborious bookkeeping. Lack of proper management of computation can result in lost opportunities to share common computations across multiple jobs or to compute results incrementally. Nectar is a system designed to address the aforementioned problems. It automates and unifies the management of data and computation within a datacenter. In Nectar, data and computation are treated interchangeably by associating data with its computation. Derived datasets, which are the results of computations, are uniquely identified by the programs that produce them, and together with their programs, are automatically managed by a datacenter wide caching service. Any derived dataset can be transparently regenerated by reexecuting its program, and any computation can be transparently avoided by using previously cached results. This enables us to greatly improve datacenter management and resource utilization: obsolete or infrequently used derived datasets are automatically garbage collected, and shared common computations are computed only once and reused by others. This paper describes the design and implementation of Nectar, and reports on our evaluation of the system using analytic studies of logs from several production clusters and an actual deployment on a 240-node cluster.