Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units
IEEE Transactions on Computers
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3
ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
The Message Driven Processor: An Integrated Multicomputer Processing Element
ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Programming the FlexRAM parallel intelligent memory system
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
MTDT '99 Proceedings of the 1999 IEEE International Workshop on Memory Technology, Design, and Testing
Enhancing NIC Performance for MPI using Processing-in-Memory
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
PIM lite: a multithreaded processor-in-memory prototype
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Embedded floating-point units in FPGAs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Architectural modifications to enhance the floating-point performance of FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The challenges of massive on-chip concurrency
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.