Three-dimensional memory vectorization for high bandwidth media memory systems
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The research of an embedded processor element for multimedia domain
MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 0.00 |
Abstract: Future media workloads will require about two levels of magnitude the performance achieved by current general purpose processors.High uni-threaded performance will be needed to accomplish real-time constraints together with huge computational throughput, as next generation of media workloads will be eminently multithreaded (MPEG- 4/MPEG-7). In order to fulfill the challenge of providing both good uni-threaded performance and throughput, we propose to join the simultaneous multithreading execution paradigm (SMT) together with the ability to execute media- oriented streaming µ-SIMD instructions.This paper evaluates the performance of two different aggressive SMT processors: one with conventional µ-SIMD extensions (such as MMX)and one with longer streaming vector µ-SIMD extensions. We will show that future media workloads are, in fact, dominated by the scalar performance. The combination of SMT plus streaming vector µ-SIMD helps alleviate the performance bottleneck of the integer unit. SMT allows "hiding" vector execution underneath integer execution by overlapping the two types of computation, while the streaming vector µ-SIMD reduces the pressure on issue width and fetch bandwidth, and provides a powerfu mechanism to tolerate latency that allows to implement smart decoupled cache hierarchies.