Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

Authors:
Chih-Yung Chang;Tzung-Shi Chen;Jang-Ping Sheu
Affiliations:
Department of Computer and Information Science, Aletheia University, 32 Chen-Li St., Tamsui, Taipei, Taiwan changcy@email.au.edu.tw;Department of Information Management, Chang Jung University, Tainan, Taiwan chents@mail.cju.edu.tw;Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan sheujp@csie.ncu.edu.tw
Venue:
The Journal of Supercomputing
Year:
2000

Citing 14
Cited 0

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An introduction to numerical computations (2nd ed.)

An introduction to numerical computations (2nd ed.)
Vector Pipelining, Chaining, and Speed on the IBM 3090 and Cray X-MP

Computer
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Vector Register Allocation

IEEE Transactions on Computers
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Quantifying the Multi-level Nature of Tiling Interactions

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
To compute numerically: Concepts and strategies (Little, Brown computer systems series)

To compute numerically: Concepts and strategies (Little, Brown computer systems series)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a compilation scheme to analyze and exploit the implicit reuses of vector register data. According to the reuse analysis, we present a translation strategy that translates the vectorized loops into assembly vector codes with exploitation of vector reuses. Experimental results show that our compilation technique can improve the execution time and traffic between shared memory and vector registers. Techniques discussed here are simple, systematic, and easy to be implemented in the conventional vector compilers or translators to enhance the data locality of vector registers.