VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing
Journal of Parallel and Distributed Computing
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Speculatively vectorized bytecode
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic compilation of data-parallel kernels for vector processors
Proceedings of the Tenth International Symposium on Code Generation and Optimization
On the advantage of time-varying diversity of workload on functionally asymmetric multi-core
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Hi-index | 0.00 |
Microprocessor designers commonly utilize SIMD accelerators and their associated instruction set extensions to provide substantial performance gains at a relatively low cost for media applications. One of the most difficult problems with using SIMD accelerators is forward migration to newer generations. With larger hardware budgets and more demands for performance, SIMD accelerators evolve with both larger data widths and increased functionality with each new generation. However, this causes difficult problems in terms of binary compatibility, software migration costs, and expensive pensive redesign of the instruction set architecture. In this work, we propose Liquid SIMD to decouple the instruction set architecture from the SIMD accelerator SIMD instructions are expressed using a processor's baseline scalar instruction set, and light-weight dynamic translation maps the representation onto a broad family of SIMD accelerators. Liquid SIMD effectively bypasses the problems inherent to instruction set modification and binary compatibility across accelerator generations. We provide a detailed description of changes to a compilation framework and processor pipeline needed to support this abstraction. Additionally, we show that the hardware overhead of dynamic optimization is modest, hardware changes do not affect cycle time of the processor, and the performance impact of abstracting the SIMD accelerator is negligible. We conclude that using dynamic techniques to map instructions onto SIMD accelerators is an effective way to improve computation efficiency, without the overhead associated with modifying the instruction set.