Exploring the future of out-of-core computing with compute-local non-volatile memory

Authors:
Myoungsoo Jung;Ellis H. Wilson, III;Wonil Choi;John Shalf;Hasan Metin Aktulga;Chao Yang;Erik Saule;Umit V. Catalyurek;Mahmut Kandemir
Affiliations:
The University of Texas at Dallas;The Pennsylvania State University;The University of Texas at Dallas and The Pennsylvania State University;Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory;Biomedical Informatics;Biomedical Informatics and The Ohio State University;The Pennsylvania State University
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 29
Cited 0

A survey of out-of-core algorithms in numerical linear algebra

External memory algorithms
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method

SIAM Journal on Scientific Computing
External-Memory Breadth-First Search with Sublinear I/O

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear Systems

ISPDC '09 Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing
FRASH: hierarchical file system for FRAM and flash

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
DFS: A file system for virtualized flash storage

ACM Transactions on Storage (TOS)
Accelerating data-intensive science with Gordon and Dash

Proceedings of the 2010 TeraGrid Conference
DASH: a Recipe for a Flash-based Data Intensive Supercomputer

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems

ACM Transactions on Architecture and Code Optimization (TACO)
GPFS: a shared-disk file system for large computing clusters

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Turbocharging DBMS buffer pool using SSDs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Making TSUBAME2.0, the world's greenest production supercomputer, even greener: challenges to the architects

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
FlashTier: a lightweight, consistent and durable storage cache

Proceedings of the 7th ACM european conference on Computer Systems
FlashLight: A Lightweight Flash File System for Embedded Systems

ACM Transactions on Embedded Computing Systems (TECS)
Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Physically addressed queueing (PAQ): improving parallelism in solid state disks

Proceedings of the 39th Annual International Symposium on Computer Architecture
An evaluation of different page allocation strategies on high-speed SSDs

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
NAND flash memory-based hybrid file system for high I/O performance

Journal of Parallel and Distributed Computing
Topology-aware mappings for large-scale eigenvalue problems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
An Out-of-Core Eigensolver on SSD-equipped Clusters

CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
An Out-of-Core Dataflow Middleware to Reduce the Cost of Large Scale Iterative Solvers

ICPPW '12 Proceedings of the 2012 41st International Conference on Parallel Processing Workshops
Revisiting widely held SSD expectations and rethinking system-level implications

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Challenges in getting flash drives closer to CPU

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.