The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
System-level performance evaluation of three-dimensional integrated circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Compact thermal modeling for temperature-aware design
Proceedings of the 41st annual Design Automation Conference
3D Processing Technology and Its Impact on iA32 Microprocessors
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Thermal via placement in 3D ICs
Proceedings of the 2005 international symposium on Physical design
Logic-based eDRAM: origins and rationale for use
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SMP-SoC is the answer if you ask the right questions
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Proceedings of the 34th annual international symposium on Computer architecture
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
A modular 3d processor for flexible product design and technology migration
Proceedings of the 5th conference on Computing frontiers
Invited paper: Network-on-Chip design and synthesis outlook
Integration, the VLSI Journal
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
MIRA: A Multi-layered On-Chip Interconnect Router Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Trend and Challenge on System-on-a-Chip Designs
Journal of Signal Processing Systems
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient implementation of decoupling capacitors in 3D processor-dram integrated computing systems
Proceedings of the 19th ACM Great Lakes symposium on VLSI
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Disaggregated memory for expansion and sharing in blade servers
Proceedings of the 36th annual international symposium on Computer architecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Variation-tolerant non-uniform 3D cache management in die stacked multicore processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cost-aware three-dimensional (3D) many-core multiprocessor design
Proceedings of the 47th Design Automation Conference
An efficient distributed memory interface for many-core platform with 3D stacked DRAM
Proceedings of the Conference on Design, Automation and Test in Europe
An RDL-configurable 3D memory tier to replace on-chip SRAM
Proceedings of the Conference on Design, Automation and Test in Europe
System-level power/performance evaluation of 3D stacked DRAMs for mobile applications
Proceedings of the Conference on Design, Automation and Test in Europe
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
A novel 3D layer-multiplexed on-chip network
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Fabrication cost analysis and cost-aware design space exploration for 3-D ICs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Microprocessors & Microsystems
A composite and scalable cache coherence protocol for large scale CMPs
Proceedings of the international conference on Supercomputing
Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs
Proceedings of the 38th annual international symposium on Computer architecture
Token3D: reducing temperature in 3d die-stacked CMPs through cycle-level power control mechanisms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A study of 3D Network-on-Chip design for data parallel H.264 coding
Microprocessors & Microsystems
Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planning
Proceedings of the International Conference on Computer-Aided Design
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
System-level integrated server architectures for scale-out datacenters
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Three-dimensional Integrated Circuits: Design, EDA, and Architecture
Foundations and Trends in Electronic Design Automation
Reliability-aware platform optimization for 3D chip multi-processors
The Journal of Supercomputing
A limits study of benefits from nanostore-based future data-centric system architectures
Proceedings of the 9th conference on Computing Frontiers
Proceedings of the 39th Annual International Symposium on Computer Architecture
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Distributed memory interface synthesis for network-on-chips with 3D-stacked DRAMs
Proceedings of the International Conference on Computer-Aided Design
Rethinking DRAM Power Modes for Energy Proportionality
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
3D integration for power-efficient computing
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 40th Annual International Symposium on Computer Architecture
Randomized partially-minimal routing: near-optimal oblivious routing for 3-D mesh networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM Transactions on Architecture and Code Optimization (TACO)
Integrated 3D-stacked server designs for increasing physical density of key-value stores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.