Garbage collection for multicore NUMA machines
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
How soccer players would do stream joins
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Two-hop free-space based optical interconnects for chip multiprocessors
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Why nothing matters: the impact of zeroing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Memory Performance And SPEC OpenMP scalability on quad-socket x86 64 systems
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Building a scalable and portable message-passing library for embedded multicore systems
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
An efficient software shared virtual memory for the single-chip cloud computer
Proceedings of the Second Asia-Pacific Workshop on Systems
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Why on-chip cache coherence is here to stay
Communications of the ACM
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
Virtualization of reconfigurable coprocessors in HPRC systems with multicore architecture
Journal of Systems Architecture: the EUROMICRO Journal
Node-based memory management for scalable NUMA architectures
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Characterizing and Understanding PDES Behavior on Tilera Architecture
PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Interference resilient PDES on multi-core systems: towards proportional slowdown
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Scaling LAPACK panel operations using parallel cache assignment
ACM Transactions on Mathematical Software (TOMS)
Ordering circuit establishment in multiplane NoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A generic high-performance method for deinterleaving scientific data
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.02 |
The 12-core AMD Opteron processor, code-named "Magny Cours," combines advances in silicon, packaging, interconnect, cache coherence protocol, and server architecture to increase the compute density of high-volume commodity 2P/4P blade servers while operating within the same power envelope as earlier-generation AMD Opteron processors. A key enabling feature, the probe filter, reduces both the bandwidth overhead of traditional broadcast-based coherence and memory latency.