Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Automatic measurement of memory hierarchy parameters
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
mhz: anatomy of a micro-benchmark
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Achieving accurate and context-sensitive timing for code optimization
Software—Practice & Experience
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Automatic measurement of instruction cache capacity
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
The Servet 3.0 benchmark suite: Characterization of network performance degradation
Computers and Electrical Engineering
Hi-index | 0.00 |
DARPA's AACE project aimed to develop Architecture Aware Compiler Environments that automatically characterizes the hardware and optimizes the application codes accordingly. We present the BlackjackBench -- a suite of portable benchmarks that automate system characterization, plus statistical analysis techniques for interpreting the results. The BlackjackBench discovers the effective sizes and speeds of the hardware environment rather than the often unattainable peak values. We aim at hardware characteristics that can be observed by running standard C codes. We characterize the memory hierarchy, including cache sharing and NUMA characteristics of the system, properties of the processing cores affecting instruction execution speed, and the length of the OS scheduler time slot. We show how they all could potentially interfere with each other and how established classification and statistical analysis techniques reduce experimental noise and aid automatic interpretation of results.