High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Technology, performance, and computer-aided design of three-dimensional integrated circuits
Proceedings of the 2004 international symposium on Physical design
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Bridging the Processor-Memory Performance Gapwith 3D IC Technology
IEEE Design & Test
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy
Proceedings of the 43rd annual Design Automation Conference
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM SIGARCH Computer Architecture News
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging 3D Technology for Improved Reliability
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient implementation of decoupling capacitors in 3D processor-dram integrated computing systems
Proceedings of the 19th ACM Great Lakes symposium on VLSI
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Variation-tolerant non-uniform 3D cache management in die stacked multicore processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The impact of liquid cooling on 3D multi-core processors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
Proceedings of the 37th annual international symposium on Computer architecture
Rethinking DRAM design and organization for energy-constrained multi-cores
Proceedings of the 37th annual international symposium on Computer architecture
A Network Congestion-Aware Memory Controller
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Quantifying and coping with parametric variations in 3D-stacked microarchitectures
Proceedings of the 47th Design Automation Conference
A multilayer nanophotonic interconnection network for on-chip many-core communications
Proceedings of the 47th Design Automation Conference
MEDICS: ultra-portable processing for medical image reconstruction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An efficient distributed memory interface for many-core platform with 3D stacked DRAM
Proceedings of the Conference on Design, Automation and Test in Europe
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
Hardware trust implications of 3-D integration
WESS '10 Proceedings of the 5th Workshop on Embedded Systems Security
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Should we worry about memory loss?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Design and management of 3D-stacked NUCA cache for chip multiprocessors
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Microprocessors & Microsystems
Proceedings of the 38th annual international symposium on Computer architecture
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Analysis and mitigation of lateral thermal blockage effect of through-silicon-via in 3D IC designs
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Layout effects in fine grain 3D integrated regular microprocessor blocks
Proceedings of the 48th Design Automation Conference
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A register-file approach for row buffer caches in die-stacked DRAMs
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A qualitative security analysis of a new class of 3-d integrated crypto co-processors
Cryptography and Security
Three-dimensional Integrated Circuits: Design, EDA, and Architecture
Foundations and Trends in Electronic Design Automation
New memory organizations for 3d DRAM and PCMs
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Distributed sensor data processing for many-cores
Proceedings of the great lakes symposium on VLSI
Proceedings of the 49th Annual Design Automation Conference
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
Design of low power 3D hybrid memory by non-volatile CBRAM-crossbar with block-level data-retention
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A micro-architectural analysis of switched photonic multi-chip interconnects
Proceedings of the 39th Annual International Symposium on Computer Architecture
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
A distributed interleaving scheme for efficient access to WideIO DRAM memory
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Yield Improvement for 3D Wafer-to-Wafer Stacked Memories
Journal of Electronic Testing: Theory and Applications
Asymmetric DRAM synthesis for heterogeneous chip multiprocessors in 3D-stacked architecture
Proceedings of the International Conference on Computer-Aided Design
Distributed memory interface synthesis for network-on-chips with 3D-stacked DRAMs
Proceedings of the International Conference on Computer-Aided Design
Predictability for timing and temperature in multiprocessor system-on-chip platforms
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture
Journal of Computer and System Sciences
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Computers and Electrical Engineering
RFiof: an RF approach to I/O-pin and memory controller scalability for off-chip memories
Proceedings of the ACM International Conference on Computing Frontiers
Geometric approach to chip-scale TSV shield placement for the reduction of TSV coupling in 3D-ICs
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Future memory and interconnect technologies
Proceedings of the Conference on Design, Automation and Test in Europe
3D-MMC: a modular 3D multi-core architecture with efficient resource pooling
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches
Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Co-tuning of a hybrid electronic-optical network for reducing energy consumption in embedded CMPs
Proceedings of the First International Workshop on Many-core Embedded Systems
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Software-controlled transparent management of heterogeneous memory resources in virtualized systems
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A new perspective on processing-in-memory architecture design
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
On effective TSV repair for 3D-stacked ICs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An energy efficient DRAM subsystem for 3D integrated SoCs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Exploring DRAM organizations for energy-efficient and resilient exascale memories
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MMSoC: a multi-layer multi-core storage-on-chip design for systems with high integration
Proceedings of the 14th International Conference on Computer Systems and Technologies
Centip3De: a many-core prototype exploring 3D integration and near-threshold computing
Communications of the ACM
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os
Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Yield-enhancement schemes for multicore processor and memory stacked 3D ICs
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.02 |
Three-dimensional integration enables stacking memory directly on top of a microprocessor, thereby significantly reducing wire delay between the two. Previous studies have examined the performance benefits of such an approach, but all of these works only consider commodity 2D DRAM organizations. In this work, we explore more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count. Our simulation results show that with a few simple changes to the 3D-DRAM organization, we can achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on our memory-intensive multi-programmed workloads on a quad-core processor. The significant increase in memory system performance makes the L2 miss handling architecture (MHA) a new bottleneck, which we address by combining a novel data structure called the Vector Bloom Filter with dynamic MSHR capacity tuning. Our scalable L2 MHA yields an additional 17.8% performance improvement over our 3D-stacked memory architecture.