DASH: a Recipe for a Flash-based Data Intensive Supercomputer

Authors:
Jiahua He;Arun Jagatheesan;Sandeep Gupta;Jeffrey Bennett;Allan Snavely
Affiliations:
-;-;-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 8
Cited 6

Linda and Friends

Computer
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Design tradeoffs for SSD performance

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Understanding intrinsic characteristics and system implications of flash memory based solid state drives

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Characterizing flash memory: anomalies, observations, and applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
DASH-IO: an empirical study of flash-based IO for HPC

Proceedings of the 2010 TeraGrid Conference
vNUMA: a virtual shared-memory multiprocessor

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Accelerating data-intensive science with Gordon and Dash

Proceedings of the 2010 TeraGrid Conference
Parallel high-resolution climate data analysis using swift

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
FlashBlades: System architecture and applications

Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Exploring the future of out-of-core computing with compute-local non-volatile memory

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data intensive computing can be defined as computation involving large datasets and complicated I/O patterns. Data intensive computing is challenging because there is a five-orders-of-magnitude latency gap between main memory DRAM and spinning hard disks; the result is that an inordinate amount of time in data intensive computing is spent accessing data on disk. To address this problem we designed and built a prototype data intensive supercomputer named DASH that exploits flash-based Solid State Drive (SSD) technology and also virtually aggregated DRAM to fill the latency gap . DASH uses commodity parts including Intel® X25-E flash drives and distributed shared memory (DSM) software from ScaleMP®. The system is highly competitive with several commercial offerings by several metrics including achieved IOPS (input output operations per second), IOPS per dollar of system acquisition cost, IOPS per watt during operation, and IOPS per gigabyte (GB) of available storage. We present here an overview of the design of DASH, an analysis of its cost efficiency, then a detailed recipe for how we designed and tuned it for high data-performance, lastly show that running data-intensive scientific applications from graph theory, biology, and astronomy, we achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures.