The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Technology, performance, and computer-aided design of three-dimensional integrated circuits
Proceedings of the 2004 international symposium on Physical design
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
2.5D system integration: a design driven system implementation schema
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
3D Processing Technology and Its Impact on iA32 Microprocessors
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Design and analysis of an NoC architecture from performance, reliability and energy perspective
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Three-Dimensional Cache Design Exploration Using 3DCacti
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Implementing Caches in a 3D Technology for High Performance Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
A Hybrid SoC Interconnect with Dynamic TDMA-Based Transaction-Less Buses and On-Chip Networks
VLSID '06 Proceedings of the 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design
Thermal-driven multilevel routing for 3-D ICs
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Thermal Trends in Emerging Technologies
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Energy- and performance-aware mapping for regular NoC architectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Proceedings of the 34th annual international symposium on Computer architecture
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
A modular 3d processor for flexible product design and technology migration
Proceedings of the 5th conference on Computing frontiers
Invited paper: Network-on-Chip design and synthesis outlook
Integration, the VLSI Journal
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
3-D topologies for networks-on-chip
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
MIRA: A Multi-layered On-Chip Interconnect Router Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A practical approach of memory access parallelization to exploit multiple off-chip DDR memories
Proceedings of the 45th annual Design Automation Conference
A multi-story power delivery technique for 3D integrated circuits
Proceedings of the 13th international symposium on Low power electronics and design
An open-loop flow control scheme based on the accurate global information of on-chip communication
Proceedings of the conference on Design, automation and test in Europe
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Parametric yield management for 3D ICs: Models and strategies for improvement
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Three-dimensional Integrated Circuit Design
Three-dimensional Integrated Circuit Design
Thermal optimization in multi-granularity multi-core floorplanning
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Synthesis of networks on chips for 3D systems on chips
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
Exploration of 3D stacked L2 cache design for high performance and efficient thermal control
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Networks-on-chip in emerging interconnect paradigms: Advantages and challenges
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Scalability of network-on-chip communication architecture for 3-D meshes
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Exploring serial vertical interconnects for 3D ICs
Proceedings of the 46th Annual Design Automation Conference
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
Proceedings of the 46th Annual Design Automation Conference
An SDRAM-aware router for Networks-on-Chip
Proceedings of the 46th Annual Design Automation Conference
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Variation-tolerant non-uniform 3D cache management in die stacked multicore processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thermal modeling for 3D-ICs with integrated microchannel cooling
Proceedings of the 2009 International Conference on Computer-Aided Design
From 2D to 3D NoCs: a case study on worst-case communication performance
Proceedings of the 2009 International Conference on Computer-Aided Design
Cost-driven 3D integration with interconnect layers
Proceedings of the 47th Design Automation Conference
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
SunFloor 3D: a tool for networks on chip topology synthesis for 3D systems on chips
Proceedings of the Conference on Design, Automation and Test in Europe
System-level process variability analysis and mitigation for 3D MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
System-level power/performance evaluation of 3D stacked DRAMs for mobile applications
Proceedings of the Conference on Design, Automation and Test in Europe
An SDRAM-aware router for networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
A novel 3D layer-multiplexed on-chip network
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Application-specific 3D Network-on-Chip design using simulated allocation
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Sunfloor 3D: a tool for networks on chip topology synthesis for 3-D systems on chips
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hierarchical circuit-switched NoC for multicore video processing
Microprocessors & Microsystems
OPAL: a multi-layer hybrid photonic NoC for 3D ICs
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Vertical interconnects squeezing in symmetric 3D mesh network-on-chip
Proceedings of the 16th Asia and South Pacific Design Automation Conference
3D network-on-chip architectures using homogeneous meshes and heterogeneous floorplans
International Journal of Reconfigurable Computing - Special issue on selected papers from ReconFig 2009 International conference on reconfigurable computing and FPGAs (ReconFig 2009)
Application exploration for 3-d integrated circuits: TCAM, FIFO, and FFT case studies
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and management of 3D-stacked NUCA cache for chip multiprocessors
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Microprocessors & Microsystems
3D floorplanning of low-power and area-efficient Network-on-Chip architecture
Microprocessors & Microsystems
A vertical bubble flow network using inductive-coupling for 3-D CMPs
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Exploring partitioning methods for 3D Networks-on-Chip utilizing adaptive routing model
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Cluster-based topologies for 3D stacked architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Worst-case temperature analysis for different resource availabilities: a case study
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
NEMS based thermal management for 3D many-core system
NANOARCH '11 Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures
A study of 3D Network-on-Chip design for data parallel H.264 coding
Microprocessors & Microsystems
3D NOC for many-core processors
Microelectronics Journal
Application-specific temperature reduction systematic methodology for 2d and 3d networks-on-chip
PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
3D-ICE: fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling
Proceedings of the International Conference on Computer-Aided Design
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Optimized 3D Network-on-Chip Design Using Simulated Allocation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Three-dimensional Integrated Circuits: Design, EDA, and Architecture
Foundations and Trends in Electronic Design Automation
Reliability-aware platform optimization for 3D chip multi-processors
The Journal of Supercomputing
Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces
Proceedings of the 49th Annual Design Automation Conference
Exploration of heuristic scheduling algorithms for 3D multicore processors
Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Software—Practice & Experience
Reconfigurable multicore architecture for dynamic processor reallocation
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power consumption and performance analysis of 3D NoCs
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
A high-efficiency low-cost heterogeneous 3D network-on-chip design
Proceedings of the Fifth International Workshop on Network on Chip Architectures
Multiobjective optimization of deadspace, a critical resource for 3D-IC integration
Proceedings of the International Conference on Computer-Aided Design
Cost evaluation on reuse of generic network service dies in three-dimensional integrated circuits
Microelectronics Journal
Developing a power-efficient and low-cost 3D NoC using smart GALS-based vertical channels
Journal of Computer and System Sciences
Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture
Journal of Computer and System Sciences
Proceedings of the 2013 ACM international symposium on International symposium on physical design
Reliability-Aware Heterogeneous 3D Chip Multiprocessor Design
Journal of Electronic Testing: Theory and Applications
Future memory and interconnect technologies
Proceedings of the Conference on Design, Automation and Test in Europe
Randomized partially-minimal routing: near-optimal oblivious routing for 3-D mesh networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Thermal-constrained task allocation for interconnect energy reduction in 3-D homogeneous MPSoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Journal of Supercomputing
Journal of Electronic Testing: Theory and Applications
Virtualized and fault-tolerant inter-layer-links for 3D-ICs
Microprocessors & Microsystems
High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip
Journal of High Speed Networks
Hi-index | 0.00 |
Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple device layers are stacked together. Considering the current trends towards increasing use of chip multiprocessing, it is timely to consider 3D chip multiprocessor design and memory networking issues, especially in the context of data management in large L2 caches. The overall goal of this paper is to study the challenges for L2 design and management in 3D chip multiprocessors. Our first contribution is to propose a router architecture and a topology design that makes use of a network architecture embedded into the L2 cache memory. Our second contribution is to demonstrate, through extensive experiments, that a 3D L2 memory architecture generates much better results than the conventional two-dimensional (2D) designs under different number of layers and vertical (inter-wafer) connections. In particular, our experiments show that a 3D architecture with no dynamic data migration generates better performance than a 2D architecture that employs data migration. This also helps reduce power consumption in L2 due to a reduced number of data movements.