Compile-time techniques for efficient utilization of parallel memories

Authors:
Rajiv Gupta;Mary Lou Soffa
Affiliations:
Philips Laboratories, North American Philips Corporation, Briarcliff Manor, NY;Dept. of Computer Science, University of Pittsburgh, Pittsburgh, Pa
Venue:
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Year:
1988

Citing 9
Cited 1

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing

Memory storage patterns in parallel processing
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Vector access performance in parallel memories using skewed storage scheme

IEEE Transactions on Computers
A matching approach to utilizing fine-grained parallelism

Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A reconfigurable liw architecture and its compiler

A reconfigurable liw architecture and its compiler

Automatic data mapping for distributed-memory parallel computers

ICS '92 Proceedings of the 6th international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The partitioning of shared memory into a number of memory modules is an approach to achieve high memory bandwidth for parallel processors. Memory access conflicts can occur when several processors simultaneously request data from the same memory module. Although work has been done to improve access performance for vectors, no work has been reported to improve the access performance of scalars. For systems in which the processors operate in a lock-step mode, a large percentage of memory access conflicts can be predicted at compile-time. These conflicts can be avoided by appropriate distribution of data among the memory modules at compile-time. A long instruction word machine is an example of a system in which the functional units operate in a lock-step mode performing operations on data fetched in parallel from multiple memory modules. In this paper, compile-time techniques for distribution of scalars to avoid memory access conflicts are presented. Furthermore, algorithms to schedule data transfers among memory modules to avoid conflicts that cannot be avoided by the distribution of values alone are developed. The techniques have been implemented as part of a compiler for a reconfigurable long instruction word architecture. Results of experiments are presented demonstrating that a very high percentage of memory access conflicts can be avoided by scheduling a very low number of data transfers.