ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving support for locality and fine-grain sharing in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Tradeoffs in designing accelerator architectures for visual computing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
Exploration of 3D stacked L2 cache design for high performance and efficient thermal control
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Enabling ultra low voltage system operation by tolerating on-chip cache failures
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
ZerehCache: armoring cache architectures in high defect density technologies
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Empirical performance models for 3T1D memories
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Rethinking DRAM design and organization for energy-constrained multi-cores
Proceedings of the 37th annual international symposium on Computer architecture
Sentry: light-weight auxiliary memory access control
Proceedings of the 37th annual international symposium on Computer architecture
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
NCXplore: a design space exploration framework of temporal encoding for on-chip serial interconnects
International Journal of High Performance Systems Architecture
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SWEL: hardware cache coherence protocols to map shared data onto shared caches
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An integrated framework for joint design space exploration of microarchitecture and circuits
Proceedings of the Conference on Design, Automation and Test in Europe
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
Thread owned block cache: managing latency in many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
3D floorplanning of low-power and area-efficient Network-on-Chip architecture
Microprocessors & Microsystems
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs
Proceedings of the 38th annual international symposium on Computer architecture
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
A case for globally shared-medium on-chip interconnect
Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
Proceedings of the 38th annual international symposium on Computer architecture
The role of optics in future high radix switch design
Proceedings of the 38th annual international symposium on Computer architecture
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches
ACM Transactions on Architecture and Code Optimization (TACO)
T&I engine: traversal and intersection engine for hardware accelerated ray tracing
Proceedings of the 2011 SIGGRAPH Asia Conference
Emerging non-volatile memories: opportunities and challenges
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Using partial tag comparison in low-power snoop-based chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Improving server performance on multi-cores via selective off-loading of OS functionality
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Topology-Aware quality-of-service support in highly integrated chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
A signature-based power model for MPSoC on FPGA
VLSI Design
Proceedings of the 49th Annual Design Automation Conference
A hybrid NoC design for cache coherence optimization for chip multiprocessors
Proceedings of the 49th Annual Design Automation Conference
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploiting semantics of virtual memory to improve the efficiency of the on-chip memory system
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture
Journal of Computer and System Sciences
Towards a multiple-ISA embedded system
Journal of Systems Architecture: the EUROMICRO Journal
Inter-warp instruction temporal locality in deep-multithreaded GPUs
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
NOC-Out: Microarchitecting a Scale-Out Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A hardware unit for fast SAH-optimised BVH construction
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Application-driven end-to-end traffic predictions for low power NoC design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
SGRT: a mobile GPU architecture for real-time ray tracing
Proceedings of the 5th High-Performance Graphics Conference
An energy and bandwidth efficient ray tracing architecture
Proceedings of the 5th High-Performance Graphics Conference
Ordering circuit establishment in multiplane NoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
ACM Transactions on Design Automation of Electronic Systems (TODAES)
DESC: energy-efficient data exchange using synchronized counters
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Energy efficient GPU transactional memory via space-time optimizations
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Integrated 3D-stacked server designs for increasing physical density of key-value stores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Configurable range memory for effective data reuse on programmable accelerators
ACM Transactions on Design Automation of Electronic Systems (TODAES)
HARP: Harnessing inactive threads in many-core processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
A significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor perfor- mance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire mod- els (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies. Keywords: cache models, non-uniform cache archi- tectures (NUCA), memory hierarchies, on-chip intercon- nects.