An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Module-aware translation for real-life desktop applications
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Profile-driven code unloading for resource-constrained JVMs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
ACM Transactions on Architecture and Code Optimization (TACO)
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Reducing Startup Time in Co-Designed Virtual Machines
Proceedings of the 33rd annual international symposium on Computer Architecture
Managing bounded code caches in dynamic binary optimization systems
ACM Transactions on Architecture and Code Optimization (TACO)
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
t-kernel: providing reliable OS support to wireless sensor networks
Proceedings of the 4th international conference on Embedded networked sensor systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Non-intrusive dynamic application profiler for detailed loop execution characterization
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Precise simulation of interrupts using a rollback mechanism
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
Dynamic code footprint optimization for the IBM Cell Broadband Engine
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Non-intrusive dynamic application profiling for multitasked applications
Proceedings of the 46th Annual Design Automation Conference
A highly flexible, parallel virtual machine: design and experience of ILDJIT
Software—Practice & Experience
A model for self-modifying code
IH'06 Proceedings of the 8th international conference on Information hiding
TAO: two-level atomicity for dynamic binary optimizations
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Efficient binary translation system with low hardware cost
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Proceedings of the 17th ACM conference on Computer and communications security
DisIRer: Converting a retargetable compiler into a multiplatform binary translator
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient hardware-based nonintrusive dynamic application profiling
ACM Transactions on Embedded Computing Systems (TECS)
Evaluating indirect branch handling mechanisms in software dynamic translation systems
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 8th ACM International Conference on Computing Frontiers
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving dynamic binary optimization through early-exit guided code region formation
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Exploiting binary translation for fast ASIP design space exploration on fpgas
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Speculative hardware/software co-designed floating-point multiply-add fusion
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 14.98 |
We describe a VLIW architecture designed specifically as a target for dynamic compilation of an existing instruction set architecture. This design approach offers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY (Dynamically Architected Instruction Set from Yorktown). The dynamic compiler exploits runtime profile information to optimize translations so as to extract instruction level parallelism. This work reports different design trade-offs in the DAISY system and their impact on final system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.