Managing NFS and NIS
The Coda Distributed File System
Linux Journal
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
ACM Transactions on Computer Systems (TOCS)
Looking up data in P2P systems
Communications of the ACM
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Kademlia: A Peer-to-Peer Information System Based on the XOR Metric
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
File and Object Replication in Data Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A survey of Web cache replacement strategies
ACM Computing Surveys (CSUR)
Parallel Computing - Special issue: High performance computing with geographical data
Distributed caching with memcached
Linux Journal
The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Cycloid: a constant-degree and lookup-efficient P2P overlay network
Performance Evaluation - P2P computing systems
Distributing the Sloan Digital Sky Survey Using UDT and Sector
E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
A design for high-performance flash disks
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
The portable batch scheduler and the maui scheduler on linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Queue - Scalable Web Services
The quest for scalable support of data-intensive workloads in distributed systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
Many-task computing: bridging the gap between high-throughput computing and high-performance computing
Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing
The case for RAMClouds: scalable high-performance storage entirely in DRAM
ACM SIGOPS Operating Systems Review
Middleware support for many-task computing
Cluster Computing
DataSpaces: an interaction and coordination framework for coupled simulation workflows
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Integrating local job scheduler – LSFTM with GfarmTM
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Tapestry: a resilient global-scale overlay for service deployment
IEEE Journal on Selected Areas in Communications
Scientific data services: a high-performance I/O system with array semantics
Proceedings of the first annual workshop on High performance computing meets databases
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
VIDAS: object-based virtualized data sharing for high performance storage I/O
Proceedings of the 4th ACM workshop on Scientific cloud computing
Exploring reliability of exascale systems through simulations
Proceedings of the High Performance Computing Symposium
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Proceedings of the High Performance Computing Symposium
Using simulation to explore distributed key-value stores for extreme-scale system services
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
IKAROS: An HTTP-Based Distributed File System, for Low Consumption & Low Specification Devices
Journal of Grid Computing
Hi-index | 0.01 |
Exascale computers will enable the unraveling of significant scientific mysteries. Predictions are that 2019 will be the year of exascale, with millions of compute nodes and billions of threads of execution. The current architecture of high-end computing systems is decades-old and has persisted as we scaled from gigascales to petascales. In this architecture, storage is completely segregated from the compute resources and are connected via a network interconnect. This approach will not scale several orders of magnitude in terms of concurrency and throughput, and will thus prevent the move from petascale to exascale. At exascale, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for large-scale heroic applications. Storage has the potential to be the Achilles heel of exascale systems. We propose that future high-end computing systems be designed with non-volatile memory on every compute node, allowing every compute node to actively participate in the metadata and data management and leveraging many-core processors high bisection bandwidth in torus networks. This position paper discusses this revolutionary new distributed storage architecture that will make exascale computing more tractable, touching virtually all disciplines in high-end computing and fueling scientific discovery.