Optimizing dynamically-dispatched calls with run-time type feedback
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Expected I-cache miss rates via the gap model
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing the frequency of tag compares for low power I-cache design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Reconciling responsiveness with performance in pure object-oriented languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
The direct cost of virtual function calls in C++
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Design decisions influencing the UltraSPARC's instruction fetch architecture
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Memory simulators and software generators
Proceedings of the 1997 symposium on Software reusability
Adaptive page replacement based on memory reference behavior
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Digital system simulation: methodologies and examples
DAC '98 Proceedings of the 35th annual Design Automation Conference
Accurate indirect branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
The cascaded predictor: economical and adaptive branch target prediction
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An out-of-order execution technique for runtime binary translators
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A Protocol-Centric Approach to on-the-Fly Race Detection
IEEE Transactions on Parallel and Distributed Systems
Java Runtime Systems: Characterization and Architectural Implications
IEEE Transactions on Computers
Improving Java performance using hardware translation
ICS '01 Proceedings of the 15th international conference on Supercomputing
Coupling-driven bus design for low-power application-specific systems
Proceedings of the 38th annual Design Automation Conference
Power-aware partitioned cache architectures
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Low power address encoding using self-organizing lists
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
APEX: access pattern based memory architecture exploration
Proceedings of the 14th international symposium on Systems synthesis
Energy-efficient instruction cache using page-based placement
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Latency and energy aware value prediction for high-frequency processors
ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient power reduction techniques for time multiplexed address buses
Proceedings of the 15th international symposium on System Synthesis
Access pattern-based memory and connectivity architecture exploration
ACM Transactions on Embedded Computing Systems (TECS)
HP Caliper: A Framework for Performance Analysis Tools
IEEE Concurrency
Do Object-Oriented Languages Need Special Hardware Support?
ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
An Efficient Indirect Branch Predictor
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A New Facility for Dynamic Control of Program Execution: DELI
EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Generating Dynamic Program Analysis Tools
ASWEC '97 Proceedings of the Australian Software Engineering Conference
Performance Modeling Using Object-Oriented Execution-Driven Simulation}
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
Adaptive low-power address encoding techniques using self-organizing lists
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
ATOM: a system for building customized program analysis tools
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimizing ML with run-time code generation
ACM SIGPLAN Notices - Best of PLDI 1979-1999
LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
A study of source-level compiler algorithms for automatic construction of pre-execution code
ACM Transactions on Computer Systems (TOCS)
Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache
IEEE Transactions on Computers
A Compiler Analysis of Interprocedural Data Communication
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Flexible ASIC: shared masking for multiple media processors
Proceedings of the 42nd annual Design Automation Conference
Implications of Executing Compression and Encryption Applications on General Purpose Processors
IEEE Transactions on Computers
Tdb: a source-level debugger for dynamically translated programs
Proceedings of the sixth international symposium on Automated analysis-driven debugging
Low overhead program monitoring and profiling
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
DIGITAL FX!32 running 32-bit ×86 applications on alpha NT
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
A self-adjusting code cache manager to balance start-up time and memory usage
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamo: a transparent dynamic optimization system
ACM SIGPLAN Notices
Issues and support for dynamic register allocation
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Transparent dynamic instrumentation
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.01 |
Shade is an instruction-set simulator and custom trace generator. Application programs are executed and traced under the control of a user-supplied trace analyzer. To reduce communication costs, Shade and the analyzer are run in the same address space. To further improve performance, code which simulates and traces the application is dynamically generated and cached for reuse. Current implementations run on SPARC systems and, to varying degrees, simulate the SPARC (Version 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.