A survey of out-of-core algorithms in numerical linear algebra
External memory algorithms
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
SIAM Journal on Scientific Computing
External-Memory Breadth-First Search with Sublinear I/O
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Local methods for estimating pagerank values
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear Systems
ISPDC '09 Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing
FRASH: hierarchical file system for FRAM and flash
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
DFS: A file system for virtualized flash storage
ACM Transactions on Storage (TOS)
Accelerating data-intensive science with Gordon and Dash
Proceedings of the 2010 TeraGrid Conference
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
ACM Transactions on Architecture and Code Optimization (TACO)
GPFS: a shared-disk file system for large computing clusters
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Turbocharging DBMS buffer pool using SSDs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
FlashTier: a lightweight, consistent and durable storage cache
Proceedings of the 7th ACM european conference on Computer Systems
FlashLight: A Lightweight Flash File System for Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Physically addressed queueing (PAQ): improving parallelism in solid state disks
Proceedings of the 39th Annual International Symposium on Computer Architecture
An evaluation of different page allocation strategies on high-speed SSDs
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
NAND flash memory-based hybrid file system for high I/O performance
Journal of Parallel and Distributed Computing
Topology-aware mappings for large-scale eigenvalue problems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
An Out-of-Core Eigensolver on SSD-equipped Clusters
CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
An Out-of-Core Dataflow Middleware to Reduce the Cost of Large Scale Iterative Solvers
ICPPW '12 Proceedings of the 2012 41st International Conference on Parallel Processing Workshops
Revisiting widely held SSD expectations and rethinking system-level implications
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Challenges in getting flash drives closer to CPU
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Hi-index | 0.00 |
Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.