Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
IEEE Micro
Communications of the ACM - Voting systems
On a Pin Versus Block Relationship For Partitions of Logic Graphs
IEEE Transactions on Computers
Three-dimensional silicon integration
IBM Journal of Research and Development
Sal/Svm: an assembly language and virtual machine for computing with non-enumerated sets
Virtual Machines and Intermediate Languages
DOME: towards the ASTRON & IBM center for exascale technology
Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Roadmap towards ultimately-efficient zeta-scale datacenters
Proceedings of the Conference on Design, Automation and Test in Europe
An energy efficient DRAM subsystem for 3D integrated SoCs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
L24: Parallelism, performance, energy efficiency, and cost trade-offs in future sensor platforms
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
This article presents a study of the impact of packaging on the memory and power walls, in the context of application properties. The analysis is supported by characterizations of 130 hardware designs spanning 30 years, along with both microarchitectural simulation and actual-hardware performance counter measurements of 25 applications. It is shown that if trends in supply pin count (growing as the square root of current) and total packaging pin count (doubling every six years) continue, application memory bandwidth requirements, even in the presence of aggressive cache hierarchies, may limit the number of on-chip threads to under a thousand in 2020.