Measurements of a distributed file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
TCP Nice: a mechanism for background transfers
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Experiences with MapReduce, an abstraction for large-scale computation
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Conductor: orchestrating the clouds
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
See spot run: using spot instances for mapreduce workflows
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Efficient provable data possession for hybrid clouds
Proceedings of the 17th ACM conference on Computer and communications security
Optimizing intermediate data management in MapReduce computations
Proceedings of the First International Workshop on Cloud Computing Platforms
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Resource provisioning framework for mapreduce jobs with performance goals
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Reliable MapReduce computing on opportunistic resources
Cluster Computing
Resource provisioning framework for MapReduce jobs with performance goals
Proceedings of the 12th International Middleware Conference
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
Hi-index | 0.00 |
This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. We discuss salient features of this intermediate data and outline requirements for a solution. Our experiments show that existing local write-remote read solutions, traditional distributed file systems (e.g., HDFS), and support from transport protocols (e.g., TCP-Nice) cannot guarantee both data availability and minimal interference, which are our key requirements. We present design ideas for a new intermediate data storage system.