The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Parallel Programming with Polaris
Computer
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
IEEE Transactions on Computers
Complexities In DSP Software Compilation: Performance, Code Size Power, Retargetability
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
Extended design reuse trade-offs in hardware-software architecture mapping
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Flexible hardware acceleration for multimedia oriented microprocessors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Managing dynamic concurrent tasks in embedded real-time multimedia systems
Proceedings of the 15th international symposium on System Synthesis
Power-efficient flexible processor architecture for embedded applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Instruction buffering exploration for low energy embedded processors
Journal of Embedded Computing - Low-power Embedded Systems
Hi-index | 4.10 |
With recent developments in compilation technology and architectural design, the line between traditional hardware and software roles has become increasingly blurred. The compiler can now see the processor's inner structure, which lets architects exploit sophisticated program analysis techniques to hide branch and memory access delays, for example. Processors can now implement register renaming and dynamic instruction-scheduling algorithms directly in the hardware-something that was once exclusively the compiler's job. A similar shift is occurring in optimizing compilers for parallel machines. To parallelize a larger class of applications, compiler writers are moving beyond static transformations and exploring techniques that rely on runtime decisions or hardware support. This increased blurring of compile-time and runtime optimizations opens many new research opportunities, particularly for program optimization-a task typically performed entirely at compile time. This article describes an optimization continuum and shows how different classes of optimizations fall within it.