Overflow controlled SIMD arithmetic

Authors:
Jiahua Zhu;Hongjiang Zhang;Hui Shi;Binyu Zang;Chuanqi Zhu
Affiliations:
Computer Science Department, Fudan University, Shanghai, China;Computer Science Department, Fudan University, Shanghai, China;Computer Science Department, Fudan University, Shanghai, China;Computer Science Department, Fudan University, Shanghai, China;Computer Science Department, Fudan University, Shanghai, China
Venue:
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Year:
2004

Citing 8
Cited 2

Bidwidth analysis with application to silicon compilation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
How Multimedia Workloads Will Change Processor Design

Computer
Compiling for SIMD Within a Register

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Optimizing compiler for shared-memory multiple SIMD architecture

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Optimizing techniques for saturated arithmetic with first-order linear recurrence

Proceedings of the 2009 ACM symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the ”SIMD within a register” parallel architectures have existed for almost 10 years, the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques, many special features of SIMD architectures, such as packed operations, have not been thoroughly considered. As operands are tightly packed within a register, there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs, the operands should be unpacked to preserve enough space for interim overflow. By doing this, great overhead would be introduced. Furthermore, the instructions for handling interim overflows can sometimes prevent other optimizations. In this paper, a new technique, OCSA (overflow controlled SIMD arithmetic), is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. We have applied our algorithm to the multimedia benchmarks of Berkeley. The experimental results show that the OCSA algorithm can significantly improve the performance of ADPCM-Decoder (110%), MESA-Reflect (113%) and DJVU-Encoder (106%).