The Scc Compiler: SWARing at MMX 3DNow!
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Macro Extension for SIMD Processing
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Programming portable optimized multimedia applications
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
Manipulating MAXLIVE for spill-free register allocation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Floating-Point computation with just enough accuracy
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Compiler technology for blue gene systems
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Automatically tuned FFTs for bluegene/l's double FPU
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Boosting the performance of multimedia applications using SIMD instructions
CC'05 Proceedings of the 14th international conference on Compiler Construction
Overflow controlled SIMD arithmetic
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.00 |
Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated within conventional microprocessors, using their existing memory reference and instruction handling mechanisms, with the primary goal of improving the speed of specific multimedia operations. Because the SWAR implementations for various microprocessors vary widely and each is missing instructions for some SWAR operations that are needed to support a more general, portable, high-level SIMD execution model, this paper focuses on how these missing operations can be implemented using either the existing SWAR hardware or even conventional 32-bit integer instructions. In addition, SWAR offers a few new challenges for compiler optimization, and these are briefly introduced.