Resource containers: a new facility for resource management in server systems
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
The Spring System: Integrated Support for Complex Real-TimeSystems
Real-Time Systems
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
A Scalable Architecture for Cooperative Web Caching
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A survey of Web cache replacement strategies
ACM Computing Surveys (CSUR)
A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Survey of Peer-to-Peer Storage Techniques for Distributed File Systems
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II - Volume 02
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Harnessing grid resources to enable the dynamic analysis of large astronomy datasets
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Integrating local job scheduler – LSFTM with GfarmTM
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The quest for scalable support of data-intensive workloads in distributed systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
Middleware support for many-task computing
Cluster Computing
Applying Amdahl's other law to the data center
IBM Journal of Research and Development
Job and data clustering for aggregate use of multiple production cyberinfrastructures
Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
A Workflow-Aware Storage System: An Opportunity Study
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design and analysis of data management in scalable parallel scripting
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing and Deploying a Scientific Computing Cloud Platform
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Adapting scientific workflow structures using multi-objective optimization strategies
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
JETS: Language and System Support for Many-Parallel-Task Workflows
Journal of Grid Computing
Hi-index | 0.00 |
Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.