A quasi-minimal residual variant of the Bi-CGSTAB algorithm for nonsymmetric systems
SIAM Journal on Scientific Computing
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Stream processor architecture
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Cache aware optimization of stream programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing stream programs using linear state space analysis
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Data Parallel Address Architecture
IEEE Computer Architecture Letters
Hi-index | 0.00 |
It is very important to organize streams well to make stream programs take advantage of the parallel computing and memory system of the stream processor effectively, especially for scientific stream programs. In this paper, after analyzing typical scientific programs, we present and characterize two methods to optimize the stream organization: stream reusing and stream transpose. Several representative scientific stream programs with and without our optimization are performed on a stream typical processor simulator. Simulation results show that these methods can improve scientific stream program performance greatly.