A programmer's view of performance monitoring in the PowerPC microprocessor
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
Algorithms for VLSI Physical Design Automation
Algorithms for VLSI Physical Design Automation
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Future generation supercomputers I: a paradigm for node architecture
ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
PASCOM: power model for supercomputers
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
The Von-Neumann bottleneck is a major impediment towards attaining higher performance. Novel merged logic-memory architectures such as the Processor In Memory (PIM) approach seek to delay the Von-Neumann bottleneck. This paper introduces the concept of Memory In Processor (MIP) architecture, which overcomes this bottleneck by providing a logical and physical integration of the memory into the functional units of the processor thereby creating a memory like organization. This is unlike the PIM, which purely involves physical integration. The MIP node employs High-Level Functional Units (HLF units) like matrix multipliers, matrix inverters, sorter units and graph algorithm solvers all of which are designed to be memory like. This integration has led to the equivalence of functional unit density with the SRAM, initiating a new memory-based metric for quantifying the HLF unit capability in terms of bytes. The 2 MB MIP node operates on 128 bit data, and has been shown to be equivalent to the performance of a uniprocessor 3D torus cluster of 5*5*4 nodes. Thereby achieving supercomputing on a multi billion-device chip. The MIP cluster markedly deviates from conventional approaches of cluster based supercomputing and attains high performance while maintaining a smaller node count.