Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
System-level performance evaluation of three-dimensional integrated circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Impact of three-dimensional architectures on interconnects in gigascale integration
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Design Challenges of Technology Scaling
IEEE Micro
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Complexity-effective superscalar processors
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Technology, performance, and computer-aided design of three-dimensional integrated circuits
Proceedings of the 2004 international symposium on Physical design
Efficient Thermal Placement of Standard Cells in 3D ICs using a Force Directed Approach
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
2.5D system integration: a design driven system implementation schema
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
3D Processing Technology and Its Impact on iA32 Microprocessors
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Three-Dimensional Cache Design Exploration Using 3DCacti
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Implementing Caches in a 3D Technology for High Performance Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Thermal-driven multilevel routing for 3-D ICs
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Floorplanning for 3-D VLSI design
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scalability of 3D-integrated arithmetic units in high-performance microprocessors
Proceedings of the 44th annual Design Automation Conference
A modular 3d processor for flexible product design and technology migration
Proceedings of the 5th conference on Computing frontiers
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Design space exploration for 3-D cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The impact of liquid cooling on 3D multi-core processors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Thermal-aware floorplan schemes for reliable 3D multi-core processors
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
Adaptive dynamic frequency scaling for thermal-aware 3d multi-core processors
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Hi-index | 0.00 |
We present the design of high-performance and energy-efficient dynamic instruction schedulers in a 3-Dimensional integration technology. Based on a previous observation that the critical path latency of a conventional dynamic scheduler is greatly affected by wire delay, we propose 3D-integrated scheduler designs by partitioning a conventional scheduler across multiple vertically-stacked die. The die-stacked organization reduces the lengths of critical wires thus reducing both latency and energy. Our simulation results show that a 20-entry (120-entry) instruction scheduler implemented in a 2-die stack achieves a 9% (19%) reduction in latency with simultaneous energy reduction as compared to a conventional planar design. The benefits are even larger when the instruction scheduler is implemented on a 4-die stack, with the corresponding latency reductions being 12% (32%).