The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Stream processor architecture
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Streaming architectures and technology trends
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Hi-index | 0.00 |
Developing scientific computing applications on the stream processor has absorbed a lot of researchers attention. In this paper, we implement and optimize dense LU decomposition on the stream processor. Different from other existing parallel algorithms for LU decomposition, StreamLUD algorithm aims at exploiting producerconsumer locality and at overlapping chip-off memory access with kernel execution. Simulation results show that dealing with matrices of different sizes, compared with LUD of HPL on an Itanium 2 processor, StreamLUD we implement and optimize gets a speedup from 2.56 to 3.64 ultimately.