Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
TRAPEDS: producing traces for multicomputers via execution driven simulation
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
ACM Computing Surveys (CSUR)
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel program behavioral study on a shared-memory multiprocessor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Implementing a cache for a high-performance GaAs microprocessor
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Instruction level profiling and evaluation of the IBM/6000
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On the validity of trace-driven simulation for multiprocessors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The flight recorder: an architectural aid for system monitoring
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Fast instruction cache performance evaluation using compile-time analysis
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Synthetic Traces for Trace-Driven Simulation of Cache Memories
IEEE Transactions on Computers
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effectiveness of caches for vector processors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Maya: a simulation platform for distributed shared memories
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Execution-driven simulation of multiprocessors: address and timing analysis
ACM Transactions on Modeling and Computer Simulation (TOMACS)
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Active memory: a new abstraction for memory-system simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Constructing instruction traces from cache-filtered address traces (CITCAT)
ACM SIGARCH Computer Architecture News
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
A national trace collection and distribution resource
ACM SIGARCH Computer Architecture News
Facilitating level three cache studies using set sampling
Proceedings of the 32nd conference on Winter simulation
Using the BACH trace collection mechanism to characterize the SPEC 2000 integer benchmarks
Workload characterization of emerging computer applications
Computer
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
IEEE Transactions on Computers
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation
IEEE Transactions on Parallel and Distributed Systems
RECET - A Real-Time Cache Evaluation Tool
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Improving performance by cache driven memory management
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Inaccuracy of Trace-Driven Simulation Using Incomplete Multiprogramming Trace Data
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Efficient trace-sampling simulation techniques for cache performance analysis
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
Performance Modeling Using Object-Oriented Execution-Driven Simulation}
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
Constructing multiprocessor workload characterizations
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Performance prediction of paging workloads using lightweight tracing
Future Generation Computer Systems - Systems performance analysis and evaluation
ATOM: a flexible interface for building high performance program analysis tools
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
Hi-index | 0.02 |
Existing methods of generating and analyzing traces suffer from a variety of limitations including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC machines. We use a trace generation mechanism based on link-time code modification which is simple to use, generates accurate long traces of multi-user programs, runs on a RISC machine, and can be flexibly controlled. On-the-fly analysis of the traces allows us to get accurate performance data for large second-level caches. We compare the performance of systems with 512K to 16M second-level caches, and show that for today's large programs, second-level caches of more than 4MB may be unnecessary. We also show that set associativity in second-level caches of more than 1MB does not significantly improve system performance. In addition, our experiments also provide insights into first-level and second-level cache line size.