On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System
IEEE Transactions on Computers
A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Probabilistic construction of deterministic algorithms: approximating packing integer programs
Journal of Computer and System Sciences - 27th IEEE Conference on Foundations of Computer Science October 27-29, 1986
Some results in memory conflict analysis
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A bridging model for parallel computation
Communications of the ACM
Accurate modelling of interconnection networks in vector supercomputers
ICS '91 Proceedings of the 5th international conference on Supercomputing
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Journal of Computer and System Sciences
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Parallel algorithms for shared-memory machines
Handbook of theoretical computer science (vol. A)
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Measurement of memory access contentions in multiple vector processor systems
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Methods for message routing in parallel machines
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
An introduction to parallel algorithms
An introduction to parallel algorithms
Characterizing memory performance in vector multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
An improved supercomputer sorting benchmark
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance of cached DRAM organizations in vector supercomputers
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of parallel algorithms for connected components
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Deterministic sorting and randomized median finding on the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Efficient low-contention parallel algorithms
Journal of Computer and System Sciences
ICS '90 Proceedings of the 4th international conference on Supercomputing
Can shared-memory model serve as a bridging model for parallel computation?
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The QRQW PRAM: accounting for contention in parallel algorithms
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Parallel hashing: an efficient implementation of shared memory
Journal of the ACM (JACM)
Interference in multiprocessor computer systems with interleaved memory
Communications of the ACM
The Art of Computer Programming Volumes 1-3 Boxed Set
The Art of Computer Programming Volumes 1-3 Boxed Set
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Analytical Estimation of Vector Access Performance in Parallel Memory Architectures
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Simulation-based Comparison of Hash Functions for Emulated Shared Memory
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Polynomial Hash Functions Are Reliable (Extended Abstract)
ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Bulk synchronous parallel computing-a paradigm for transportable software
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Journal of Parallel and Distributed Computing
SmartApps: An Application Centric Approach to High Performance Computing
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Mappings for Conflict-Free Access of Paths in Elementary Data Structures
COCOON '00 Proceedings of the 6th Annual International Conference on Computing and Combinatorics
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Parallelism versus memory allocation in pipelined router forwarding engines
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Conflict-free star-access in parallel memory systems
Journal of Parallel and Distributed Computing
On the L(h, k)-labeling of co-comparability graphs
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulk-synchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)-BSP. We show experimentally that the (d, x)-BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machine-specific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)-BSP as a bridging model for emulating a very high-level abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)-BSP.