Efficient support for irregular applications on distributed-memory machines

Authors:
Shubhendu S. Mukherjee;Shamik D. Sharma;Mark D. Hill;James R. Larus;Anne Rogers;Joel Saltz
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Department of Computer Science, University of Maryland, 4166 A.V. Williams Building, College Park, MD;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ;Department of Computer Science, University of Maryland, 4166 A.V. Williams Building, College Park, MD
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 19
Cited 49

Partitioning sparse matrices with eigenvectors of graphs

SIAM Journal on Matrix Analysis and Applications
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing an irregular application on a distributed memory multiprocessor

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor

ICS '94 Proceedings of the 8th international conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Software versus hardware shared-memory implementation: a case study

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Support for distributed dynamic data structures in C++

Support for distributed dynamic data structures in C++
A manual for the CHAOS runtime library

A manual for the CHAOS runtime library
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Slicing Analysis and Indirect Accesses to Distributed Arrays

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Tempest: a substrate for portable parallel programs

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference

A parallel software infrastructure for structured adaptive mesh methods

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Ace: linguistic mechanisms for customizable protocols

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Retrospective: tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory

IEEE Transactions on Computers
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The scalability of multigrain systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Ace: a language for parallel programming with customizable protocols

ACM Transactions on Computer Systems (TOCS)
Multigrain shared memory

ACM Transactions on Computer Systems (TOCS)
Improving fine-grained irregular shared-memory benchmarks by data reordering

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
ENSEMBLE: A Communication Layer for Embedded Multi-Processor Systems

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs

International Journal of Parallel Programming
Parallelizing graph construction operations in programs with cyclic graphs

Parallel Computing
Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
PIT: A Library for the Parallelization of Irregular Problems

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Collecting Remote Data in Irregular Problems with Hierarchical Representation of the Domain

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Identifying parallelism in programs with cyclic graphs

Journal of Parallel and Distributed Computing
Identifying Parallelism in Programs with Cyclic Graphs

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Sparks: coherence as an abstract type

IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Memory coherence activity prediction in commercial workloads

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Store-Ordered Streaming of Shared Memory

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
An efficient cache design for scalable glueless shared-memory multiprocessors

Proceedings of the 3rd conference on Computing frontiers
Interprocedural definition-use chains of dynamic pointer-linked data structures

Scientific Programming
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Asynchronous progressive irregular prefix operation in HPF2

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
An idiom-finding tool for increasing productivity of accelerators

Proceedings of the international conference on Supercomputing
Leveraging data-structure semantics for efficient algorithmic parallelism

Proceedings of the 8th ACM International Conference on Computing Frontiers
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A novel lightweight directory architecture for scalable shared-memory multiprocessors

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Prototyping hardware support for irregular applications

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools

Quantified Score

Hi-index	0.01

Visualization

Abstract

Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable run-time behavior makes this type of parallel program difficult to write and adversely affects run-time performance.This paper explores three issues—partitioning, mutual exclusion, and data transfer—crucial to the efficient execution of irregular problems on distributed-memory machines. Unlike previous work, we studied the same programs running in three alternative systems on the same hardware base (a Thinking Machines CM-5): the CHAOS irregular application library, Transparent Shared Memory (TSM), and eXtensible Shared Memory (XSM). CHAOS and XSM performed equivalently for all three applications. Both systems were somewhat (13%) to significantly faster (991%) than TSM.