MOM: a matrix SIMD instruction set architecture for multimedia applications
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Implementation and Evaluation of the Complex Streamed Instruction Set
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Vector microprocessors
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
Matrix register file and extended subwords: two techniques for embedded media processors
Proceedings of the 2nd conference on Computing frontiers
Register pointer architecture for efficient embedded processors
Proceedings of the conference on Design, automation and test in Europe
The Burroughs Scientific Processor (BSP)
IEEE Transactions on Computers
IEEE Computer Architecture Letters
The IBM System/370 vector architecture
IBM Systems Journal
Dynamically reconfigurable register file for a softcore VLIW processor
Proceedings of the Conference on Design, Automation and Test in Europe
IEEE Micro
Scalability evaluation of a polymorphic register file: A CG case study
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Multimedia rectangularly addressable memory
IEEE Transactions on Multimedia
Scalability Study of Polymorphic Register Files
DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
Hi-index | 0.00 |
This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). We present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. We compare the throughput of our PRF to the nVidia Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane PRFs can outperform the GPU for 9 ×9 or larger mask sizes.