Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Exploitation of instruction-level parallelism for detection of processor execution errors
Exploitation of instruction-level parallelism for detection of processor execution errors
Balanced scheduling: instruction scheduling when memory latency is uncertain
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Detecting pipeline structural hazards quickly
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Active memory: a new abstraction for memory-system simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance evaluation of the PowerPC 620 microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient and language-independent mobile programs
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Computer
Data Dependence Analysis of Assembly Code
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
iWatcher: Efficient Architectural Support for Software Debugging
Proceedings of the 31st annual international symposium on Computer architecture
Efficient and flexible architectural support for dynamic monitoring
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Modern microprocessors offer more instruction-level parallelism than most programs and compilers can currently exploit. The resulting disparity between a machine's peak and actual performance, while frustrating for computer architects and chip manufacturers, opens the exciting possibility of low-cost instrumentation for measurement, simulation, or emulation. Instrumentation code that executes in previously unused processor cycles is effectively hidden. On two superscalar SPARC processors, a simple, local scheduler hid an average of 14% of the overhead cost of profiling instrumentation in the SPECINT benchmarks and an average of 30% of the profiling cost in the SPECFP benchmarks.