WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing

Authors:
W. Jalby;C. Lemuet;X. Le Pasteur
Affiliations:
PRISM LABORATORY, UNIVERSITY OF VERSAILLES, FRANCE;PRISM LABORATORY, UNIVERSITY OF VERSAILLES, FRANCE;PRISM LABORATORY, UNIVERSITY OF VERSAILLES, FRANCE
Venue:
International Journal of High Performance Computing Applications
Year:
2004

Citing 11
Cited 4

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000

IEEE Transactions on Parallel and Distributed Systems
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Introducing the IA-64 Architecture

IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes

IEEE Transactions on Computers
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015)

Scientific Programming
POWER4 system microarchitecture

IBM Journal of Research and Development

Loop Optimization using Hierarchical Compilation and Kernel Decomposition

Proceedings of the International Symposium on Code Generation and Optimization
Characterizing the performance penalties induced by irregular code using pointer structures and indirection arrays on the intel core 2 architecture

Proceedings of the 6th ACM conference on Computing frontiers
Iterative compilation with kernel exploration

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Introducing a performance model for bandwidth-limited loop kernels

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

Memory hierarchies are a key component in obtaining high performance on modern microprocessors. To satisfy the ever-increasing demand on data rate access, they are also becoming increasingly complex: multilevel caches, non-blocking caches, sophisticated instructions for supporting prefetch and cache control, etc. If all of these advanced features promise to offer large performance gains, they also generate in some cases performance "anomalies" (i.e. bad performance triggered by specific code patterns). For precisely locating and understanding these anomalies, a new set of microbenchmarks called WBTK is introduced. We show through systematic experimentation on Alpha 21264, Power4 and Itanium1 that this microbenchmark first allowed us to detect most of the anomalies encountered on simple BLAS1 type codes. Secondly, it led us to demonstrate that vectorization of memory access was an efficient workaround for most of these anomalies.