Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede

Authors:
Qingyu Meng;Alan Humphrey;John Schmidt;Martin Berzins
Affiliations:
University of Utah, Salt Lake City, UT;University of Utah, Salt Lake City, UT;University of Utah, Salt Lake City, UT;University of Utah, Salt Lake City, UT
Venue:
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Year:
2013

Citing 9
Cited 1

Uintah: A Massively Parallel Problem Solving Environment

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
A component-based parallel infrastructure for the simulation of fluid–structure interaction

Engineering with Computers
An Eulerian-Lagrangian approach for simulating explosions of energetic devices

Computers and Structures
A case study for petascale applications in astrophysics: simulating gamma-ray bursts

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Non-data-communication Overheads in MPI: Analysis on Blue Gene/P

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Uintah: a scalable framework for hazard analysis

Proceedings of the 2010 TeraGrid Conference
Using hybrid parallelism to improve memory use in the Uintah framework

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
DAG-Based software frameworks for PDEs

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond

Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based methods, together with a novel asynchronous task-based approach and fully automated load balancing. While we have designed scalable Uintah runtime systems for large CPU core counts, the emergence of heterogeneous systems presents considerable challenges in terms of effectively utilizing additional on-node accelerators and co-processors, deep memory hierarchies, as well as managing multiple levels of parallelism. Our recent work has addressed the emergence of heterogeneous CPU/GPU systems with the design of a Unified heterogeneous runtime system, enabling Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. Using this design, Uintah has run at full scale on the Keeneland System and TitanDev. With the release of the Intel Xeon Phi co-processor and the recent availability of the Stampede system, we show that Uintah may be modified to utilize such a coprocessor based system. We also explore the different usage models provided by the Xeon Phi with the aim of understanding portability of a general purpose framework like Uintah to this architecture. These usage models range from the pragma based offload model to the more complex symmetric model, utilizing all co-processor and host CPU cores simultaneously. We provide preliminary results of the various usage models for a challenging adaptive mesh refinement problem, as well as a detailed account of our experience adapting Uintah to run on the Stampede system. Our conclusion is that while the Stampede system is easy to use, obtaining high performance from the Xeon Phi co-processors requires a substantial but different investment to that needed for GPU-based systems.