Experimentally Characterizing the Behavior of Multiprocessor Memory Systems: A Case Study
IEEE Transactions on Software Engineering
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Performance evaluation and improvement of parallel applications on high performance architectures
Performance evaluation and improvement of parallel applications on high performance architectures
Domain decomposition, irregular applications, and parallel computers
Domain decomposition, irregular applications, and parallel computers
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Modeling the Communication Performance of the IBM SP2
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
A highly scalable system utilizing up to 128 PA-RISC processors
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Proceedings of the 25th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
International Journal of Parallel Programming
Hi-index | 0.00 |
In a distributed shared memory (DSM) multiprocessor, the processors cooperate in solving a parallel application by accessing the shared memory. The latency of a memory access depends on several factors, including the distance to the nearest valid data copy, data sharing conditions, and traffic of other processors. To provide a better understanding of DSM performance and to support application tuning and compiler development for DSM systems, this paper extends microbenchmarking techniques to characterize the important aspects of a DSM system. We present an experiment-based methodology for characterizing the memory, communication, scheduling, and synchronization performance, and apply it to the Convex SPP1000. We present carefully designed microbenchmarks to characterize the performance of the local and remote memory, producer-consumer communication involving two or more processors, and the effects on performance when multiple processors contend for utilization of the distributed memory and the interconnection network.