Managing Very-Large Distributed Datasets

Authors:
Miguel Branco;Ed Zaluska;David Roure;Pedro Salgado;Vincent Garonne;Mario Lassnig;Ricardo Rocha
Affiliations:
CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria;CERN - European Organization for Nuclear Research, University of Southampton,UK,University of Innsbruck, Austria
Venue:
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Year:
2008

Citing 16
Cited 1

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Coda: A Highly Available File System for a Distributed Workstation Environment

IEEE Transactions on Computers
A security architecture for computational grids

CCS '98 Proceedings of the 5th ACM conference on Computer and communications security
GASS: a data movement and access service for wide area computing systems

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Search and replication in unstructured peer-to-peer networks

ICS '02 Proceedings of the 16th international conference on Supercomputing
Replication strategies in unstructured peer-to-peer networks

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Simulation of Dynamic Grid Replication Strategies in OptorSim

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The SDSC storage resource broker

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Data Replication Strategies in Grid Environments

ICA3PP '02 Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing
Architectural styles and the design of network-based software architectures

Architectural styles and the design of network-based software architectures
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Massive High-Performance Global File Systems for Grid computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Survey of research towards robust peer-to-peer networks: search methods

Computer Networks: The International Journal of Computer and Telecommunications Networking
Data storage, access and catalogs in gLite

LGDI '05 Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology

Identification, Modelling and Prediction of Non-periodic Bursts in Workloads

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a system for handling very large datasets, which need to be stored across multiple computing sites. Data distribution introduces complex management issues, particularly as computing sites may make use of different storage systems with different internal organizations. The motivation for our work is the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the authors are involved in developing the data management middleware. This middleware, called DQ2, is charged with shipping petabytes of data every month to research centers and universities worldwide and has achieved aggregate throughputs in excess of 1.5 Gbytes/sec over the wide-area network. We describe DQ2's design and implementation, which builds upon previous work on distributed file systems, peer-to-peer systems and Data Grids. We discuss its fault tolerance and scalability properties and briefly describe results from its daily usage for the ATLAS Experiment.