Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

Authors:
Milo M. K. Martin;Daniel J. Sorin;Bradford M. Beckmann;Michael R. Marty;Min Xu;Alaa R. Alameldeen;Kevin E. Moore;Mark D. Hill;David A. Wood
Affiliations:
Univ. of Pennsylvania;Duke Univ.;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison;Univ. of Wisconsin-Madison
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 17
Cited 324

The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Full-system timing-first simulation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol

IEEE Transactions on Parallel and Distributed Systems
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers

Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
Simics: A Full System Simulation Platform

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Verifying a Multiprocessor Cache Controller Using Random Test Generation

IEEE Design & Test
Simulating a $2M Commercial Server on a $2K PC

Computer
Protocol Verification as a Hardware Design Aid

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
The Alpha 21364 Network Architecture

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Fingerprinting: bounding soft-error detection latency and bandwidth

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Token Coherence: A New Framework for Shared-Memory Multiprocessors

IEEE Micro

Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
IPC Considered Harmful for Multiprocessor Workloads

IEEE Micro
The M5 Simulator: Modeling Networked Systems

IEEE Micro
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Supporting nested transactional memory in logTM

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Leveraging Wire Properties at the Microarchitecture Level

IEEE Micro
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Nonblocking transactions without indirection using alert-on-update

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Making the fast case common and the uncommon case simple in unbounded transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
An integrated hardware-software approach to flexible transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics

Proceedings of the 34th annual international symposium on Computer architecture
Architectural implications of brick and mortar silicon manufacturing

Proceedings of the 34th annual international symposium on Computer architecture
Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
RAMP: Research Accelerator for Multiple Processors

IEEE Micro
The impact of wrong-path memory references in cache-coherent multiprocessor systems

Journal of Parallel and Distributed Computing
A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Split hardware transactions: true nesting of transactions using best-effort hardware transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Understanding the propagation of hard errors to software and implications for resilient system design

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Adaptive transaction scheduling for transactional memory systems

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Flexible Decoupled Transactional Memory Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Rerun: Exploiting Episodes for Lightweight Memory Race Recording

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors

Proceedings of the 13th international symposium on Low power electronics and design
Full-system chip multiprocessor power evaluations using FPGA-based emulation

Proceedings of the 13th international symposium on Low power electronics and design
Reducing the Interconnection Network Cost of Chip Multiprocessors

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
SP-NUCA: a cost effective dynamic non-uniform cache architecture

ACM SIGARCH Computer Architecture News
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

The Journal of Supercomputing
MCjammer: adaptive verification for multi-core designs

Proceedings of the conference on Design, automation and test in Europe
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Distributed cooperative caching

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving support for locality and fine-grain sharing in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Protocol offload analysis by simulation

Journal of Systems Architecture: the EUROMICRO Journal
Version management alternatives for hardware transactional memory

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
MC-Sim: an efficient simulation tool for MPSoC designs

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Frequent value compression in packet-based NoC architectures

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Two hardware-based approaches for deterministic multiprocessor replay

Communications of the ACM - One Laptop Per Child: Vision vs. Reality
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Notary: Hardware techniques to enhance signatures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Toward a multicore architecture for real-time ray-tracing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Extending concurrency of transactional memory programs by using value prediction

Proceedings of the 6th ACM conference on Computing frontiers
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
DSiMCluster: A Simulation Model for Efficient Memory Analysis Experiments of DSM Clusters

Simulation
Refereeing conflicts in hardware transactional memory

Proceedings of the 23rd international conference on Supercomputing
Limited early value communication to improve performance of transactional memory

Proceedings of the 23rd international conference on Supercomputing
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Memory mapped ECC: low-cost error protection for last level caches

Proceedings of the 36th annual international symposium on Computer architecture
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Triplet-based topology for on-chip networks

WSEAS Transactions on Computers
How to simulate 1000 cores

ACM SIGARCH Computer Architecture News
NZTM: nonblocking zero-indirection transactional memory

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The use of hardware transactional memory for the trace-based parallelization of recursive Java programs

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Exploring concentration and channel slicing in on-chip network router

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
An Efficient Lightweight Shared Cache Design for Chip Multiprocessors

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Hybrid Techniques for Fast Multicore Simulation

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
MPTLsim: a simulator for X86 multicore processors

Proceedings of the 46th Annual Design Automation Conference
Full-system simulation of distributed memory multicomputers

Cluster Computing
A performance evaluation of 2D-mesh, ring, and crossbar interconnects for chip multi-processors

Proceedings of the 2nd International Workshop on Network on Chip Architectures
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible cache error protection using an ECC FIFO

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Ordering decoupled metadata accesses in multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologies

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accurately evaluating application performance in simulated hybrid multi-tasking systems

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
New techniques for simulating high performance MPI applications on large storage networks

The Journal of Supercomputing
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Specifying and dynamically verifying address translation-aware memory consistency

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Network interfaces for programmable NICs and multicore platforms

Computer Networks: The International Journal of Computer and Telecommunications Networking
Specification-based Verification in a Distributed Shared Memory Simulation Model

Simulation
A scalable organization for distributed directories

Journal of Systems Architecture: the EUROMICRO Journal
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
Accelerating multi-core simulators

Proceedings of the 2010 ACM Symposium on Applied Computing
A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores

Proceedings of the 20th symposium on Great lakes symposium on VLSI
On-chip communication and synchronization mechanisms with cache-integrated network interfaces

Proceedings of the 7th ACM international conference on Computing frontiers
Exploit temporal locality of shared data in SRC enabled CMP

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Architectural implications of cache coherence protocols with network applications on chip multiprocessors

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Directory-based conflict detection in hardware transactional memory

HiPC'08 Proceedings of the 15th international conference on High performance computing
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs

HiPC'08 Proceedings of the 15th international conference on High performance computing
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A session key caching and prefetching scheme for secure communication in cluster systems

Journal of Parallel and Distributed Computing
Cache topology aware computation mapping for multicores

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The auction: optimizing banks usage in Non-Uniform Cache Architectures

Proceedings of the 24th ACM International Conference on Supercomputing
A Component-based Simulator for MIPS32 Processors

Simulation
Characterizing the soft error vulnerability of multicores running multithreaded applications

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
The virtual write queue: coordinating DRAM and last-level cache policies

Proceedings of the 37th annual international symposium on Computer architecture
Timetraveler: exploiting acyclic races for optimizing memory race recording

Proceedings of the 37th annual international symposium on Computer architecture
A case for FAME: FPGA architecture model execution

Proceedings of the 37th annual international symposium on Computer architecture
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Proceedings of the 37th annual international symposium on Computer architecture
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Power-Efficient and High-Performance Multi-level Hybrid Nanophotonic Interconnect for Multicores

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Performance Evaluation of a Multicore System with Optically Connected Memory Modules

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Virtual channels vs. multiple physical networks: a comparative analysis

Proceedings of the 47th Design Automation Conference
Trace-driven optimization of networks-on-chip configurations

Proceedings of the 47th Design Automation Conference
RAMP gold: an FPGA-based architecture simulator for multiprocessors

Proceedings of the 47th Design Automation Conference
Hardware transactional memory: A high performance parallel programming model

Journal of Systems Architecture: the EUROMICRO Journal
Token tenure and PATCH: A predictive/adaptive token-counting hybrid

ACM Transactions on Architecture and Code Optimization (TACO)
Implementation tradeoffs in the design of flexible transactional memory support

Journal of Parallel and Distributed Computing
A practical way to extend shared memory support beyond a motherboard at low cost

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
System-level max power (SYMPO): a systematic approach for escalating system-level power consumption using synthetic benchmarks

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proximity coherence for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ReSim, a trace-driven, reconfigurable ILP processor simulator

Proceedings of the Conference on Design, Automation and Test in Europe
CASPAR: hardware patching for multi-core processors

Proceedings of the Conference on Design, Automation and Test in Europe
Group-caching for NoC based multicore cache coherent systems

Proceedings of the Conference on Design, Automation and Test in Europe
FastFwd: an efficient hardware acceleration technique for trace-driven network-on-chip simulation

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Accelerating data movement on future chip multi-processors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Interval filter: a locality-aware alternative to bloom filters for hardware membership queries by interval classification

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
On-Chip Network Evaluation Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Power-efficient spilling techniques for chip multiprocessors

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Architectural support for thread communications in multi-core processors

Parallel Computing
An analytical network performance model for SIMD processor CSX600 interconnects

Journal of Systems Architecture: the EUROMICRO Journal
A generic adaptive path-based routing method for MPSoCs

Journal of Systems Architecture: the EUROMICRO Journal
Thread criticality support in on-chip networks

Proceedings of the Third International Workshop on Network on Chip Architectures
A variable-pipeline on-chip router optimized to traffic pattern

Proceedings of the Third International Workshop on Network on Chip Architectures
A power-efficient network on-chip topology

Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Flow Control for Robust Performance and Energy

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fractal Coherence: Scalably Verifiable Cache Coherence

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Relaxed Concurrency Control in Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Dynamically Adaptable Hardware Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
COREMU: a scalable and portable parallel full-system emulator

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Virtualizing network-on-chip resources in chip-multiprocessors

Microprocessors & Microsystems
Design of a performance enhanced and power reduced dual-crossbar Network-on-Chip (NoC) architecture

Microprocessors & Microsystems
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A statistical performance model of the opteron processor

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Multiset signatures for transactional memory

Proceedings of the international conference on Supercomputing
ZEBRA: a data-centric, hybrid-policy hardware transactional memory design

Proceedings of the international conference on Supercomputing
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Karma: scalable deterministic record-replay

Proceedings of the international conference on Supercomputing
A vertical bubble flow network using inductive-coupling for 3-D CMPs

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Exploring partitioning methods for 3D Networks-on-Chip utilizing adaptive routing model

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A distributed and topology-agnostic approach for on-line NoC testing

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
DART: a programmable architecture for NoC simulation on FPGAs

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Inferring packet dependencies to improve trace based simulation of on-chip networks

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Sampling + DMR: practical and low-overhead permanent fault detection

Proceedings of the 38th annual international symposium on Computer architecture
An abacus turn model for time/space-efficient reconfigurable routing

Proceedings of the 38th annual international symposium on Computer architecture
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
An energy-efficient adaptive hybrid cache

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
NoC frequency scaling with flexible-pipeline routers

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
The gem5 simulator

ACM SIGARCH Computer Architecture News
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips

Proceedings of the 48th Design Automation Conference
Enabling system-level modeling of variation-induced faults in networks-on-chips

Proceedings of the 48th Design Automation Conference
A helper thread based dynamic cache partitioning scheme for multithreaded applications

Proceedings of the 48th Design Automation Conference
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference
Design of a new cloud computing simulation platform

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
Evaluation of low-overhead organizations for the directory in future many-core CMPs

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Multilayer cache partitioning for multiprogram workloads

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Token3D: reducing temperature in 3d die-stacked CMPs through cycle-level power control mechanisms

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Unified locality-sensitive signatures for transactional memory

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Poster: dissection the version management schemes in hardware transactional memory systems

ACM SIGMETRICS Performance Evaluation Review - Special Issue on IFIP PERFORMANCE 2011- 29th International Symposium on Computer Performance, Modeling, Measurement and Evaluation
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting temporal decoupling to accelerate trace-driven NoC emulation

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
MAximum Multicore POwer (MAMPO): an automatic multithreaded synthetic power virus generation framework for multicore systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Canonical Multicore Architecture for Network Routers

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Compiler support for concurrency synchronization

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
An FPGA-based scalable simulation accelerator for tile architectures

ACM SIGARCH Computer Architecture News
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
DAPSCO: Distance-aware partially shared cache organization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hardware transactional memory with software-defined conflicts

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
On the simulation of large-scale architectures using multiple application abstraction levels

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
VSim: Simulating multi-server setups at near native hardware speed

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Acceleration techniques for chip-multiprocessor simulator debug

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A highly robust distributed fault-tolerant routing algorithm for NoCs with localized rerouting

Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Switch-based packing technique to reduce traffic and latency in token coherence

Journal of Parallel and Distributed Computing
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
Assuring application-level correctness against soft errors

Proceedings of the International Conference on Computer-Aided Design
Co-design of channel buffers and crossbar organizations in NoCs architectures

Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient trace-driven metaheuristics for optimization of networks-on-chip configurations

Proceedings of the International Conference on Computer-Aided Design
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Applying transactional memory to concurrency bugs

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A systematic methodology to develop resilient cache coherence protocols

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
FeatherWeight: low-cost optical arbitration with QoS support

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Manager-client pairing: a framework for implementing coherence hierarchies

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A data layout optimization framework for NUCA-based multicores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Improving performance by reducing aborts in hardware transactional memory

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Efficient transaction nesting in hardware transactional memory

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Neighborhood-aware data locality optimization for NoC-based multicores

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Reliability-aware platform optimization for 3D chip multi-processors

The Journal of Supercomputing
Network-on-Chip virtualization in Chip-Multiprocessor Systems

Journal of Systems Architecture: the EUROMICRO Journal
Improving coherence protocol reactiveness by trading bandwidth for latency

Proceedings of the 9th conference on Computing Frontiers
Cost-effective power delivery to support per-core voltage domains for power-constrained processors

Proceedings of the 49th Annual Design Automation Conference
Attackboard: a novel dependency-aware traffic generator for exploring NoC design space

Proceedings of the 49th Annual Design Automation Conference
Quick detection of difficult bugs for effective post-silicon validation

Proceedings of the 49th Annual Design Automation Conference
A hybrid NoC design for cache coherence optimization for chip multiprocessors

Proceedings of the 49th Annual Design Automation Conference
Architecture support for accelerator-rich CMPs

Proceedings of the 49th Annual Design Automation Conference
Virtual-machine-based emulation of future generation high-performance computing systems

International Journal of High Performance Computing Applications
Real-time network-on-chip simulation modeling

Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique

Proceedings of the 26th ACM international conference on Supercomputing
Simulating the future kilo-x86-64 core processors and their infrastructure

Proceedings of the 45th Annual Simulation Symposium
ASCIB: adaptive selection of cache indexing bits for removing conflict misses

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-efficient non-minimal path on-chip interconnection network for heterogeneous systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Static and dynamic co-optimizations for blocks mapping in hybrid caches

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CHARM: a composable heterogeneous accelerator-rich microprocessor

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Something old and something new: P-states can borrow microarchitecture techniques too

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A new degree of freedom for memory allocation in clusters

Cluster Computing
CCTR: An efficient point-to-point memory race recorder implemented in chunks

Microprocessors & Microsystems
Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Efficient implementation of globally-aware network flow control

Journal of Parallel and Distributed Computing
APCR: an adaptive physical channel regulator for on-chip interconnects

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
PS-Dir: a scalable two-level directory cache

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Shared hardware data structures for hard real-time systems

Proceedings of the tenth ACM international conference on Embedded software
Efficient traffic aware power management in multicore communications processors

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
SCIN-cache: Fast speculative versioning in multithreaded cores

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A high-efficiency low-cost heterogeneous 3D network-on-chip design

Proceedings of the Fifth International Workshop on Network on Chip Architectures
Bus and memory protection through chain-generated and tree-verified IV for multiprocessors systems

Future Generation Computer Systems
Prototyping hardware support for irregular applications

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Architecture support for custom instructions with memory operations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
High-performance and low-energy buffer mapping method for multiprocessor DSP systems

ACM Transactions on Embedded Computing Systems (TECS)
Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture

Journal of Computer and System Sciences
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

ACM Transactions on Architecture and Code Optimization (TACO)
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
Deploying hardware locks to improve performance and energy efficiency of hardware transactional memory

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Addressing End-to-End Memory Access Latency in NoC-Based Multicores

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
TRANSIT: specifying protocols with concolic snippets

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
An efficient deterministic record-replay with separate dependencies

Computers and Electrical Engineering
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes

Proceedings of the 27th international ACM conference on International conference on supercomputing
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Replacement techniques for dynamic NUCA cache designs on CMPs

The Journal of Supercomputing
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Multi-processor architectural support for protecting virtual machine privacy in untrusted cloud environment

Proceedings of the ACM International Conference on Computing Frontiers
Orchestrator: a low-cost solution to reduce voltage emergencies for multi-threaded applications

Proceedings of the Conference on Design, Automation and Test in Europe
Overcoming post-silicon validation challenges through quick error detection (QED)

Proceedings of the Conference on Design, Automation and Test in Europe
Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach

Proceedings of the Conference on Design, Automation and Test in Europe
Cache coherence enabled adaptive refresh for volatile STT-RAM

Proceedings of the Conference on Design, Automation and Test in Europe
CPU transparent protection of OS kernel and hypervisor integrity with programmable DRAM

Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Locality-aware task management for unstructured parallelism: a quantitative limit study

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Design space exploration of FinFET cache

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
CATRA- congestion aware trapezoid-based routing algorithm for on-chip networks

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An MILP-based aging-aware routing algorithm for NoCs

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hardware and software support for fine-grained memory access control and encapsulation in C++

Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Optimal placement of vertical connections in 3D Network-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Meeting midway: improving CMP performance with memory-side prefetching

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
McRouter: multicast within a router for high performance network-on-chips

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient multicast schemes for 3-D Networks-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Modeling the impact of permanent faults in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
High-performance fractal coherence

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Techniques to improve performance in requester-wins hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO)
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era

ACM Transactions on Architecture and Code Optimization (TACO)
The case of using multiple streams in streaming

International Journal of Automation and Computing
Thread-criticality aware dynamic cache reconfiguration in multi-core system

Proceedings of the International Conference on Computer-Aided Design
Efficient execution of speculative threads and transactions with hardware transactional memory

Future Generation Computer Systems
Dual partitioning multicasting for high-performance on-chip networks

Journal of Parallel and Distributed Computing
Unified reliability estimation and management of NoC based chip multiprocessors

Microprocessors & Microsystems
Bi-LCQ: A low-weight clustering-based Q-learning approach for NoCs

Microprocessors & Microsystems
An effectiveness-based adaptive cache replacement policy

Microprocessors & Microsystems
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols

Journal of Parallel and Distributed Computing
A novel architecture for ahead branch prediction

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers. We leverage an existing full-system functional simulation infrastructure (Simics [14]) as the basis around which to build a set of timing simulator modules for modeling the timing of the memory system and microprocessors. This simulator infrastructure enables us to run architectural experiments using a suite of scaled-down commercial workloads [3]. To enable other researchers to more easily perform such research, we have released these timing simulator modules as the Multifacet General Execution-driven Multiprocessor Simulator (GEMS) Toolset, release 1.0, under GNU GPL [9].