Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Automatic construction of systems of recurrence relations
USSR Computational Mathematics and Mathematical Physics
Chains of recurrences—a method to expedite the evaluation of closed-form functions
ISSAC '94 Proceedings of the international symposium on Symbolic and algebraic computation
ACM Computing Surveys (CSUR)
CTADEL: a generator of multi-platform high performance codes for PDE-based scientific applications
ICS '96 Proceedings of the 10th international conference on Supercomputing
On computational properties of chains of recurrences
Proceedings of the 2001 international symposium on Symbolic and algebraic computation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Mapping Techniques for Parallel Evaluation of Chains of Recurrences
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Recurrent Relations and Speed-up of Computations Using Computer Algebra Systems
DISCO '92 Proceedings of the International Symposium on Design and Implementation of Symbolic Computation Systems
IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
A unified framework for nonlinear dependence testing and symbolic analysis
Proceedings of the 18th annual international conference on Supercomputing
Hi-index | 0.00 |
Many computational tasks require repeated evaluation of functions over structured grids, such as plotting in a coordinate system, rendering of parametric objects in 2D and 3D, numerical grid generation, and signal processing. In this paper, we present a method and toolset to speed up closed-form function evaluations over grids by vectorizing Chains of Recurrences (CR). CR forms of closed-form functions require fewer operations to evaluate per grid point. However, the present CR formalism makes CR forms inherently non-vectorizable due to the dependences carried from one point to the next. To address this limitation, we developed a new decoupling method for the CR algebra to translate math functions into Vector Chains of Recurrences (VCR) forms. The VCR coefficients are packed in short vector registers for efficient execution. Performance results of benchmark functions evaluated in single and double precision VCR forms are compared to the Intel compiler's auto-vectorized code and the high-performance small vector math library (SVML). The results show a significant performance increase of our VCR method over SVML and scalar CRs, from doubling the execution speed to running an order of magnitude faster. An auto-tuning tool for VCR is developed for optimal performance and accuracy.