A low cost, multithreaded processing-in-memory system

  • Authors:
  • Jay B. Brockman;Shyamkumar Thoziyoor;Shannon K. Kuntz;Peter M. Kogge

  • Affiliations:
  • University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN

  • Venue:
  • WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.