Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Distributed Computing Economics
Queue - Object-Relational Mapping
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Zone-based data striping for cloud storage
IBM Journal of Research and Development
Hi-index | 0.00 |
The rise of ad-hoc data-intensive computing has led to the development of data-parallel programming systems such as Map/Reduce and Hadoop, which achieve scalability by tightly coupling storage and computation. This can be limiting when the ratio of computation to storage is not known in advance, or changes over time. In this work, we examine decoupling storage and computation in Hadoop through SuperDataNodes, which are servers that contain an order of magnitude more disks than traditional Hadoop nodes. We found that SuperDataNodes are not only capable of supporting workloads with high storage-to-processing workloads, but in some cases can outperform traditional Hadoop deployments through better management of a large centralized pool of disks.