Matrix-Based programming optimization for improving memory hierarchy performance on imagine

Authors:
Xuejun Yang;Jing Du;Xiaobo Yan;Yu Deng
Affiliations:
School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China
Venue:
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Year:
2006

Citing 11
Cited 3

Loop tiling for parallelism

Loop tiling for parallelism
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Imagine: Media Processing with Streams

IEEE Micro
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Programmable Stream Processors

Computer
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors

The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
Memory hierarchy design for stream computing

Memory hierarchy design for stream computing
Scientific computing applications on the imagine stream processor

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture

A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
Implementation and evaluation of Jacobi iteration on the imagine stream processor

HiPC'07 Proceedings of the 14th international conference on High performance computing
Architecture-based optimization for mapping scientific applications to imagine

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite Imagine presents an efficient memory hierarchy, the straightforward programming of scientific applications does not match the available memory hierarchy and thereby constrains the performance of stream applications. In this paper, we explore a novel matrix-based programming optimization for improving the memory hierarchy performance to sustain the operands needed for highly parallel computation. Our specific contributions include that we formulate the problem on the Data&Computation Matrix (D&C Matrix) that is proposed to abstract the relationship between streams and kernels, and present the key techniques for improving the multilevel bandwidth utilization based on this matrix. The experimental evaluation on five representative scientific applications shows that the new stream programs yielded by our optimization can effectively enhance the locality in LRF and SRF, improve the capacity utilization of LRF and SRF, make the best use of SPs and SBs, and avoid index stream overhead.