Subword Parallelism with MAX-2
IEEE Micro
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Hi-index | 0.00 |
Although current SIMD processor architectures can improve the processing performance by exploiting the data level parallelism inherent in video applications, an important performance penalty appears when processing data that is not formatted in an amenable way, e.g. unaligned memory access. This paper presents an enhanced DMA controller that performs block-based data transfers and a realignment when accessing a word in an external memory that is not aligned with the natural data memory/bus width boundary. Moreover, the enhanced DMA controller performs a signal extension while accessing data outside a specific region, e.g. a video frame, decreasing the total amount of processing cycles required for a typical video application. Performance improvements of up to 25% can be achieved when running a highly time consuming video encoding task (motion estimation) on a generic VLIW architecture with the enhanced DMA controller compared to a basic block-transfer DMA controller.