Executing irregular scientific applications on stream architectures

Authors:
Mattan Erez;Jung Ho Ahn;Jayanth Gummaraju;Mendel Rosenblum;William J. Dally
Affiliations:
The University of Texas at Austin;Hewlett-Packard Laboratories;Stanford University;Stanford University;Stanford University
Venue:
Proceedings of the 21st annual international conference on Supercomputing
Year:
2007

Citing 34
Cited 11

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
What have we learnt from using real parallel machines to solve real problems?

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Vector models for data-parallel computing

Vector models for data-parallel computing
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Software—Practice & Experience
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Digital systems engineering

Digital systems engineering
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Vector instruction set support for conditional operations

Proceedings of the 27th annual international symposium on Computer architecture
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Asynchronous Problems on SIMD Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
The design and implementation of a parallel array operator for the arbitrary remapping of data

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Program improvement by source to source transformation

POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A Case for Economy Grid Architecture for Service Oriented Grid Computing

IPDPS '01 Proceedings of the 10th Heterogeneous Computing Workshop â"" HCW 2001 (Workshop 1) - Volume 2
Programmable Stream Processors

Computer
Scatter-Add in Data Parallel Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Automatic Support for Irregular Computations in a High-Level Language

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Stream Register Files with Indexed Access

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
The design space of data-parallel memory systems

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Impulse: Memory system support for scientific applications

Scientific Programming
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Proceedings of the 21st annual international conference on Supercomputing
Vectorized sparse matrix multiply for compressed row storage format

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I

Streamware: programming general-purpose multicore processors using streams

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

Languages and Compilers for Parallel Computing
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor

The Journal of Supercomputing
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

International Journal of Computational Science and Engineering
Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Scalable heterogeneous parallelism for atmospheric modeling and simulation

The Journal of Supercomputing
Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

The Journal of Supercomputing
Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor

Proceedings of the 8th ACM International Conference on Computing Frontiers
I/O streaming evaluation of batch queries for data-intensive computational turbulence

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Scalability study of molecular dynamics simulation on Godson-T many-core architecture

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent emergence of compute-intensive stream processors such as the Cell Broadband Engine, Stanford's Merrimac, and Clear-Speed's CSX600 has made them attractive platforms for scientific high-performance computing. Unstructured mesh and graph applications are an important class of numerical algorithms used in the scientific computing domain, which are particularly challenging for stream architectures. These codes have irregular structures where nodes have a variable number of neighbors, resulting in irregular memory access patterns and irregular control. We study four representative sub-classes of irregular algorithms, including finite-element and finite-volume methods for modeling physical systems, direct methods for n-body problems, and computations involving sparse algebra. We propose a framework for representing the diverse characteristics of these algorithms in the context of the unique properties of stream architectures, and demonstrate it using one representative application from each sub-class. We then develop techniques for mapping the applications onto a stream processor, placing emphasis on data-localization and parallelizations. Our simulations show that efficient stream hardware with restricted control abilities can effectively run challenging irregular applications with, for example, a finite element method and a molecular dynamic code sustaining 69GFLOP/s and 46GFLOP/s (64-bit) respectively using a single chip that measures 12mm on a side and consumes less than 70W in 90nm technology.