Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

Authors:
Guy E. Blelloch;Phillip B. Gibbons;Yossi Matias;Marco Zagha
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1997

Citing 42
Cited 8

Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories

Acta Informatica
On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System

IEEE Transactions on Computers
A logarithmic time sort for linear size networks

Journal of the ACM (JACM)
Vector Computer Memory Bank Contention

IEEE Transactions on Computers
Probabilistic construction of deterministic algorithms: approximating packing integer programs

Journal of Computer and System Sciences - 27th IEEE Conference on Foundations of Computer Science October 27-29, 1986
Some results in memory conflict analysis

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A bridging model for parallel computation

Communications of the ACM
Accurate modelling of interconnection networks in vector supercomputers

ICS '91 Proceedings of the 5th international conference on Supercomputing
On randomly interleaved memories

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
How to emulate shared memory

Journal of Computer and System Sciences
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Parallel algorithms for shared-memory machines

Handbook of theoretical computer science (vol. A)
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Radix sort for vector multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Measurement of memory access contentions in multiple vector processor systems

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Methods for message routing in parallel machines

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
An introduction to parallel algorithms

An introduction to parallel algorithms
Characterizing memory performance in vector multiprocessors

ICS '92 Proceedings of the 6th international conference on Supercomputing
An improved supercomputer sorting benchmark

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance of cached DRAM organizations in vector supercomputers

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of parallel algorithms for connected components

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Deterministic sorting and randomized median finding on the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Efficient low-contention parallel algorithms

Journal of Computer and System Sciences
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The QRQW PRAM: accounting for contention in parallel algorithms

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Parallel hashing: an efficient implementation of shared memory

Journal of the ACM (JACM)
Interference in multiprocessor computer systems with interleaved memory

Communications of the ACM
The Art of Computer Programming Volumes 1-3 Boxed Set

The Art of Computer Programming Volumes 1-3 Boxed Set
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study

IEEE Transactions on Computers
Analytical Estimation of Vector Access Performance in Parallel Memory Architectures

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Simulation-based Comparison of Hash Functions for Emulated Shared Memory

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Bulk synchronous parallel computing-a paradigm for transportable software

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Mappings for conflict-free access of paths in bidimensional arrays, circular lists, and complete trees

Journal of Parallel and Distributed Computing
SmartApps: An Application Centric Approach to High Performance Computing

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Mappings for Conflict-Free Access of Paths in Elementary Data Structures

COCOON '00 Proceedings of the 6th Annual International Conference on Computing and Combinatorics
Designing Practical Efficient Algorithms for Symmetric Multiprocessors

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Parallelism versus memory allocation in pipelined router forwarding engines

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Conflict-free star-access in parallel memory systems

Journal of Parallel and Distributed Computing
On the L(h, k)-labeling of co-comparability graphs

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulk-synchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)-BSP. We show experimentally that the (d, x)-BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machine-specific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)-BSP as a bridging model for emulating a very high-level abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)-BSP.