CernVM-FS: delivering scientific software to globally distributed computing resources

Authors:
Jakob Blomer;Predrag Buncic;Thomas Fuhrmann
Affiliations:
CERN, Geneva, Switzerland;PH-SFT, Geneva, Switzerland;Technische Universität München, München, Germany
Venue:
Proceedings of the first international workshop on Network-aware data management
Year:
2011

Citing 20
Cited 1

Andrew: a distributed personal computing environment

Communications of the ACM - The MIT Press scientific computation series
Caching in the Sprite network file system

ACM Transactions on Computer Systems (TOCS)
Serverless network file systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Efficient cooperative caching using hints

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Bimodal multicast

ACM Transactions on Computer Systems (TOCS)
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Squirrel: a decentralized peer-to-peer web cache

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Ivy: a read/write peer-to-peer file system

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Lightweight probabilistic broadcast

ACM Transactions on Computer Systems (TOCS)
Distributed caching with memcached

Linux Journal
Low Diameter Interconnections for Routing in High-Performance Parallel Systems

IEEE Transactions on Computers
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The promise, and limitations, of gossip protocols

ACM SIGOPS Operating Systems Review - Gossip-based computer networking
Scalaris: reliable transactional p2p key/value store

Proceedings of the 7th ACM SIGPLAN workshop on ERLANG
Epidemic Information Dissemination in Distributed Systems

Computer
Rapid almost-complete broadcasting in faulty networks

Theoretical Computer Science
Cassandra: structured storage system on a P2P network

Proceedings of the 28th ACM symposium on Principles of distributed computing

HTC scientific computing in a distributed cloud environment

Proceedings of the 4th ACM workshop on Scientific cloud computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The computing facilities used to process data for the experiments at the Large Hadron Collider at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as Grid sites of the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in "volunteer clouds". Unlike data, the experiment software cannot be easily split into small work units. Efficient delivery of the complex and frequently changing experiment software is a crucial step to harness heterogeneous resources. Here we present an approach to deliver software on demand using a scalable hierarchy of standard HTTP caches. We show how to tackle this problem by pre-processing software into content-addressable storage. On the worker nodes, we use a specially crafted file system that ensures data integrity and provides fault-tolerance. We show performance figures from large-scale deployment. For the most common case of computing clusters with 10 to 1000 worker nodes, we present a novel state dissemination protocol to support a fully decentralized and distributed memory cache.