A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Authors:
Jinwoo Suh;Eun-Gyu Kim;Stephen P. Crago;Lakshmi Srinivasan;Matthew C. French
Affiliations:
University of Southern California, Arlington, VA;University of Southern California, Arlington, VA;University of Southern California, Arlington, VA;University of Southern California, Arlington, VA;University of Southern California, Arlington, VA
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 12
Cited 15

Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache Memories

ACM Computing Surveys (CSUR)
Imagine: Media Processing with Streams

IEEE Micro
A PIM-based Multiprocessor System

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Media Processing Applications on the Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
The Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Evaluating the Imagine Stream Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Bandwidth Management with a Reconfigurable Data Cache

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Cost-effective low-power processor-in-memory-based reconfigurable datapath for multimedia applications

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Harmony: an execution model and runtime for heterogeneous many core systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Using GPUs to improve multigrid solver performance on a cluster

International Journal of Computational Science and Engineering
Scientific computing applications on the imagine stream processor

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Matrix-Based programming optimization for improving memory hierarchy performance on imagine

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
An FFT performance model for optimizing general-purpose processor architecture

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Architecture-based optimization for mapping scientific applications to imagine

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and cache sizes have led to the proposal of new microprocessor architectures that implement processor-in- memory, stream processing, and tiled processing. Each architecture is typically evaluated separately and compared to a baseline architecture. In this paper, we evaluate the performance of processors that implement these architectures on a common set of signal processing kernels.The implementation results are compared with the measured performance of a conventional system based on the PowerPC with Altivec. The results show that these new processors show significant improvements over conventional systems and that each architecture has its own strengths and weaknesses.