Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
ACM SIGARCH Computer Architecture News
Machine Characterization Based on an Abstract High-Level Language Machine
IEEE Transactions on Computers
Cache performance of the integer SPEC benchmarks on a RISC
ACM SIGARCH Computer Architecture News
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
CPU performance evaluation and execution time prediction using narrow spectrum benchmarking
CPU performance evaluation and execution time prediction using narrow spectrum benchmarking
Generation and analysis of very long address traces
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Performance Characterization of Optimizing Compilers
IEEE Transactions on Software Engineering
An instruction timing model of CPU performance
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Analysis of Benchmark Characteristics and Benchmark Performance
Analysis of Benchmark Characteristics and Benchmark Performance
Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000
IEEE Transactions on Parallel and Distributed Systems
Performance engineering case study: heap construction
Journal of Experimental Algorithmics (JEA)
Information and control in gray-box systems
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
IEEE Transactions on Computers
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Compile-Time Based Performance Prediction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Gray-Box Knowledge of Buffer-Cache Management
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Performance Engineering Case Study: Heap Construction
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Impact of PCI-Bus Load on Applications in a PC Architecture
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
IBM Systems Journal
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Comprehensive multiprocessor cache miss rate generation using multivariate models
ACM Transactions on Computer Systems (TOCS)
Deconstructing Commodity Storage Clusters
Proceedings of the 32nd annual international symposium on Computer Architecture
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications
Future Generation Computer Systems
Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates
ACM Transactions on Computer Systems (TOCS)
A Complexity O(1) priority queue for event driven molecular dynamics simulations
Journal of Computational Physics
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
The interval page table: virtual memory support in real-time and memory-constrained embedded systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
Achieving accurate and context-sensitive timing for code optimization
Software—Practice & Experience
A performance prediction framework for scientific applications
Future Generation Computer Systems
Performance modeling for dynamic algorithm selection
ICCS'03 Proceedings of the 2003 international conference on Computational science
BlackjackBench: portable hardware characterization
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
A mathematical model for the transitional region between cache hierarchy levels
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
BlackjackBench: portable hardware characterization
ACM SIGMETRICS Performance Evaluation Review
International Journal of High Performance Computing Applications
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 14.99 |
In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: 1) developing a high level program to measure the design and performance of the cache and TLB units; 2) using those measurements, along with published miss ratio data, to improve the accuracy of our runtime predictions; 3) using our analysis tools and measurements to study and compare the design of several machines, with particular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to computer users and computer system designers.