Engineering a simple, efficient code-generator generator
ACM Letters on Programming Languages and Systems (LOPLAS)
Code selection for media processors with SIMD instructions
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Hacker's Delight
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
General-purpose simd within a register: parallel processing on consumer microprocessors
General-purpose simd within a register: parallel processing on consumer microprocessors
Generation of permutations for SIMD processors
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hi-index | 0.00 |
SIMD instructions are used to speed up multimedia applications in high performance embedded computing. Vendors often use proprietary platforms which are incompatible with others. Therefore, porting software is a very complex and time consuming task. Moreover, lots of existing embedded processors do not have SIMD extensions at all. But they do provide a wide data path which is 32-bit or wider. Usually, multimedia applications work on short data types of 8 or 16-bit. Thus, only the lower bits of the data path are used and therefore only a fraction of the available computing power is exploited for such algorithms. This paper discusses the possibility to make use of the upper bits of the data path by emulating true SIMD instructions. These instructions are implemented purely in software using a high level language such as C. Therefore, the application can be modified by making use of source code transformations which are inherently portable. The benefit of this approach is that the computing resources are used more efficiently without compromising the portability of the code. Experiments have shown that a significant speedup can be obtained by this approach.