Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
X-Ray: A Tool for Automatic Measurement of Hardware Parameters
QEST '05 Proceedings of the Second International Conference on the Quantitative Evaluation of Systems
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
The increasing complexity of computer architectures has made the approach of automatically generating code that is optimized for the target machine a growing area of interest. Examples of such systems are library generators, such as ATLAS, SPIRAL, and FFTW. To generate optimized code without manual intervention, these systems need to know the values of certain hardware parameters, such as the cache size or the number of registers. Current software such as X-Ray or LMbench can automatically determine some of these parameters for single processor super-scalar machines but cannot determine multi-core specific characteristics. In this paper, we present P-Ray, a software suite that characterizes hardware characteristics of multi-core architectures. Such characteristics include the number of cores that share the L2 cache, the different processors' interconnection topologies, and the bandwidth-to-memory. Our experiments show that, for several different architectures tested (desktop and server), P-Ray generates accurate results.