Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
IEEE Transactions on Computers
Selected papers of the second workshop on Languages and compilers for parallel computing
Performance from architecture: comparing a RISC and a CISC with similar hardware organization
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Using a lookahead window in a compaction-based parallelizing compiler
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
VLIW compilation techniques in a superscalar environment
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Instruction scheduling in the TOBEY compiler
IBM Journal of Research and Development
Performance impact of architectural features during binary to binary translation
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
A persistent rescheduled-page cache for low overhead object code compatibility in VLIW architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Binary translation and architecture convergence issues for IBM system/390
Proceedings of the 14th international conference on Supercomputing
IEEE Transactions on Computers
Execution-Based Scheduling for VLIW Architectures
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Hi-index | 0.01 |
We describe a novel architectural framework that allows software applications written for a given Complex Instruction Set Computer (CISC) to migrate to a different, higher performance architecture, without a significant investment on the part of the application user or developer. The framework provides a hardware mechanism for seamless switching between two instruction sets, resulting in a machine that enhances application performance while keeping the same program behavior (from a user perspective). High execution speed on migrated applications is achieved through automated translation of the object code of one machine to that of the other, using advanced global optimization and scheduling techniques. Issues affecting application behavior, such as precise exceptions, as well as self-modifying code, are addressed. Relaxation of full compatibility on these issues lead to further possible performance gains, encouraging applications to adopt the newer architecture.The proposed framework offers a path for moving from complex instruction set computers (CISCs) to newer architectures, such as reduced instruction set computers (RISCs), superscalars, or very long instruction word (VLIW) machines, while protecting the extensive economic investment represented by existing software. To illustrate our approach, we show how system code written (and compiled) for the IBM System/390 can yield fine-grain parallelism, as it is targeted for execution by a VLIW machine, with encouraging performance results.