Compiler-directed shared-memory communication for iterative parallel applications

Authors:
Guhan Viswanathan;James R. Larus
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI
Venue:
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Year:
1996

Citing 15
Cited 6

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Runtime compilation techniques for data partitioning and communication schedule reuse

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Where is time spent in message-passing and shared-memory programs?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A parallel software infrastructure for structured adaptive mesh methods

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Interprocedural compilation of irregular applications for distributed memory machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Teapot: language support for writing memory coherence protocols

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing

Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs

International Journal of Parallel Programming
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
MAximum Multicore POwer (MAMPO): an automatic multithreaded synthetic power virus generation framework for multicore systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientific applications are iterative and specify repetitive communication patterns. This paper shows how a parallel-language compiler and custom cache-coherence protocols in a distributed shared memory system together can implement shared-memory communication efficiently for applications with unpredictable but repetitive communication patterns. The compiler uses data-flow analysis to identify program points where repetitive communication occurs. At runtime, the custom protocol builds communication schedules in one iteration and uses it to pre-send data in following iterations. This paper contains measurements on three iterative applications (including adaptive programs with unstructured data accesses) to show that custom protocols increase the number of shared-data requests satisfied locally, thus reducing the amount of time spent waiting for remote data.