The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Verifying a Multiprocessor Cache Controller Using Random Test Generation
IEEE Design & Test
Protocol Verification as a Hardware Design Aid
ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
The Alpha 21364 Network Architecture
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The M5 Simulator: Modeling Networked Systems
IEEE Micro
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Supporting nested transactional memory in logTM
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Nonblocking transactions without indirection using alert-on-update
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Making the fast case common and the uncommon case simple in unbounded transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
An integrated hardware-software approach to flexible transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics
Proceedings of the 34th annual international symposium on Computer architecture
Architectural implications of brick and mortar silicon manufacturing
Proceedings of the 34th annual international symposium on Computer architecture
Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Adaptive transaction scheduling for transactional memory systems
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Flexible Decoupled Transactional Memory Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
Full-system chip multiprocessor power evaluations using FPGA-based emulation
Proceedings of the 13th international symposium on Low power electronics and design
Reducing the Interconnection Network Cost of Chip Multiprocessors
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
SP-NUCA: a cost effective dynamic non-uniform cache architecture
ACM SIGARCH Computer Architecture News
The Journal of Supercomputing
MCjammer: adaptive verification for multi-core designs
Proceedings of the conference on Design, automation and test in Europe
Extending CC-NUMA systems to support write update optimizations
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Distributed cooperative caching
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving support for locality and fine-grain sharing in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Protocol offload analysis by simulation
Journal of Systems Architecture: the EUROMICRO Journal
Version management alternatives for hardware transactional memory
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
MC-Sim: an efficient simulation tool for MPSoC designs
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Frequent value compression in packet-based NoC architectures
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Two hardware-based approaches for deterministic multiprocessor replay
Communications of the ACM - One Laptop Per Child: Vision vs. Reality
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive data compression for high-performance low-power on-chip networks
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Notary: Hardware techniques to enhance signatures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Extending concurrency of transactional memory programs by using value prediction
Proceedings of the 6th ACM conference on Computing frontiers
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Refereeing conflicts in hardware transactional memory
Proceedings of the 23rd international conference on Supercomputing
Limited early value communication to improve performance of transactional memory
Proceedings of the 23rd international conference on Supercomputing
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Memory mapped ECC: low-cost error protection for last level caches
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Triplet-based topology for on-chip networks
WSEAS Transactions on Computers
ACM SIGARCH Computer Architecture News
NZTM: nonblocking zero-indirection transactional memory
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Exploring concentration and channel slicing in on-chip network router
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
An Efficient Lightweight Shared Cache Design for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
L1 Collective Cache: Managing Shared Data for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Hybrid Techniques for Fast Multicore Simulation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
MPTLsim: a simulator for X86 multicore processors
Proceedings of the 46th Annual Design Automation Conference
Full-system simulation of distributed memory multicomputers
Cluster Computing
A performance evaluation of 2D-mesh, ring, and crossbar interconnects for chip multi-processors
Proceedings of the 2nd International Workshop on Network on Chip Architectures
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible cache error protection using an ECC FIFO
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Ordering decoupled metadata accesses in multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accurately evaluating application performance in simulated hybrid multi-tasking systems
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
New techniques for simulating high performance MPI applications on large storage networks
The Journal of Supercomputing
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Specifying and dynamically verifying address translation-aware memory consistency
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Network interfaces for programmable NICs and multicore platforms
Computer Networks: The International Journal of Computer and Telecommunications Networking
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
Software—Practice & Experience
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
Proceedings of the 20th symposium on Great lakes symposium on VLSI
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Exploit temporal locality of shared data in SRC enabled CMP
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Directory-based conflict detection in hardware transactional memory
HiPC'08 Proceedings of the 15th international conference on High performance computing
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs
HiPC'08 Proceedings of the 15th international conference on High performance computing
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A session key caching and prefetching scheme for secure communication in cluster systems
Journal of Parallel and Distributed Computing
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Characterizing the soft error vulnerability of multicores running multithreaded applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
The virtual write queue: coordinating DRAM and last-level cache policies
Proceedings of the 37th annual international symposium on Computer architecture
Timetraveler: exploiting acyclic races for optimizing memory race recording
Proceedings of the 37th annual international symposium on Computer architecture
A case for FAME: FPGA architecture model execution
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Power-Efficient and High-Performance Multi-level Hybrid Nanophotonic Interconnect for Multicores
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Performance Evaluation of a Multicore System with Optically Connected Memory Modules
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Virtual channels vs. multiple physical networks: a comparative analysis
Proceedings of the 47th Design Automation Conference
Trace-driven optimization of networks-on-chip configurations
Proceedings of the 47th Design Automation Conference
RAMP gold: an FPGA-based architecture simulator for multiprocessors
Proceedings of the 47th Design Automation Conference
Hardware transactional memory: A high performance parallel programming model
Journal of Systems Architecture: the EUROMICRO Journal
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
Implementation tradeoffs in the design of flexible transactional memory support
Journal of Parallel and Distributed Computing
A practical way to extend shared memory support beyond a motherboard at low cost
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proximity coherence for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ReSim, a trace-driven, reconfigurable ILP processor simulator
Proceedings of the Conference on Design, Automation and Test in Europe
CASPAR: hardware patching for multi-core processors
Proceedings of the Conference on Design, Automation and Test in Europe
Group-caching for NoC based multicore cache coherent systems
Proceedings of the Conference on Design, Automation and Test in Europe
FastFwd: an efficient hardware acceleration technique for trace-driven network-on-chip simulation
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Accelerating data movement on future chip multi-processors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
On-Chip Network Evaluation Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Power-efficient spilling techniques for chip multiprocessors
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Architectural support for thread communications in multi-core processors
Parallel Computing
An analytical network performance model for SIMD processor CSX600 interconnects
Journal of Systems Architecture: the EUROMICRO Journal
A generic adaptive path-based routing method for MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Thread criticality support in on-chip networks
Proceedings of the Third International Workshop on Network on Chip Architectures
A variable-pipeline on-chip router optimized to traffic pattern
Proceedings of the Third International Workshop on Network on Chip Architectures
A power-efficient network on-chip topology
Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Flow Control for Robust Performance and Energy
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fractal Coherence: Scalably Verifiable Cache Coherence
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Relaxed Concurrency Control in Transactional Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Dynamically Adaptable Hardware Transactional Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
COREMU: a scalable and portable parallel full-system emulator
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Virtualizing network-on-chip resources in chip-multiprocessors
Microprocessors & Microsystems
Design of a performance enhanced and power reduced dual-crossbar Network-on-Chip (NoC) architecture
Microprocessors & Microsystems
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches
ACM Transactions on Architecture and Code Optimization (TACO)
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs
Journal of Parallel and Distributed Computing
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A statistical performance model of the opteron processor
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Studying inter-core data reuse in multicores
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Multiset signatures for transactional memory
Proceedings of the international conference on Supercomputing
ZEBRA: a data-centric, hybrid-policy hardware transactional memory design
Proceedings of the international conference on Supercomputing
Predictive coordination of multiple on-chip resources for chip multiprocessors
Proceedings of the international conference on Supercomputing
Karma: scalable deterministic record-replay
Proceedings of the international conference on Supercomputing
A vertical bubble flow network using inductive-coupling for 3-D CMPs
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Exploring partitioning methods for 3D Networks-on-Chip utilizing adaptive routing model
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A distributed and topology-agnostic approach for on-line NoC testing
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
DART: a programmable architecture for NoC simulation on FPGAs
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Inferring packet dependencies to improve trace based simulation of on-chip networks
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Sampling + DMR: practical and low-overhead permanent fault detection
Proceedings of the 38th annual international symposium on Computer architecture
An abacus turn model for time/space-efficient reconfigurable routing
Proceedings of the 38th annual international symposium on Computer architecture
Studying inter-core data reuse in multicores
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
NoC frequency scaling with flexible-pipeline routers
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
ACM SIGARCH Computer Architecture News
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
Proceedings of the 48th Design Automation Conference
Enabling system-level modeling of variation-induced faults in networks-on-chips
Proceedings of the 48th Design Automation Conference
A helper thread based dynamic cache partitioning scheme for multithreaded applications
Proceedings of the 48th Design Automation Conference
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Design of a new cloud computing simulation platform
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
Evaluation of low-overhead organizations for the directory in future many-core CMPs
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Multilayer cache partitioning for multiprogram workloads
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Token3D: reducing temperature in 3d die-stacked CMPs through cycle-level power control mechanisms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Unified locality-sensitive signatures for transactional memory
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Poster: dissection the version management schemes in hardware transactional memory systems
ACM SIGMETRICS Performance Evaluation Review - Special Issue on IFIP PERFORMANCE 2011- 29th International Symposium on Computer Performance, Modeling, Measurement and Evaluation
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting temporal decoupling to accelerate trace-driven NoC emulation
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Canonical Multicore Architecture for Network Routers
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Compiler support for concurrency synchronization
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
An FPGA-based scalable simulation accelerator for tile architectures
ACM SIGARCH Computer Architecture News
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
DAPSCO: Distance-aware partially shared cache organization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hardware transactional memory with software-defined conflicts
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
VSim: Simulating multi-server setups at near native hardware speed
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Acceleration techniques for chip-multiprocessor simulator debug
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A highly robust distributed fault-tolerant routing algorithm for NoCs with localized rerouting
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Improving shared cache behavior of multithreaded object-oriented applications in multicores
Proceedings of the International Conference on Computer-Aided Design
Assuring application-level correctness against soft errors
Proceedings of the International Conference on Computer-Aided Design
Co-design of channel buffers and crossbar organizations in NoCs architectures
Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient trace-driven metaheuristics for optimization of networks-on-chip configurations
Proceedings of the International Conference on Computer-Aided Design
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Applying transactional memory to concurrency bugs
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A systematic methodology to develop resilient cache coherence protocols
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
FeatherWeight: low-cost optical arbitration with QoS support
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Improving performance by reducing aborts in hardware transactional memory
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Efficient transaction nesting in hardware transactional memory
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Is reuse distance applicable to data locality analysis on chip multiprocessors?
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Neighborhood-aware data locality optimization for NoC-based multicores
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Reliability-aware platform optimization for 3D chip multi-processors
The Journal of Supercomputing
Network-on-Chip virtualization in Chip-Multiprocessor Systems
Journal of Systems Architecture: the EUROMICRO Journal
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
Cost-effective power delivery to support per-core voltage domains for power-constrained processors
Proceedings of the 49th Annual Design Automation Conference
Attackboard: a novel dependency-aware traffic generator for exploring NoC design space
Proceedings of the 49th Annual Design Automation Conference
Quick detection of difficult bugs for effective post-silicon validation
Proceedings of the 49th Annual Design Automation Conference
A hybrid NoC design for cache coherence optimization for chip multiprocessors
Proceedings of the 49th Annual Design Automation Conference
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
Virtual-machine-based emulation of future generation high-performance computing systems
International Journal of High Performance Computing Applications
Real-time network-on-chip simulation modeling
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Proceedings of the 26th ACM international conference on Supercomputing
Simulating the future kilo-x86-64 core processors and their infrastructure
Proceedings of the 45th Annual Simulation Symposium
ASCIB: adaptive selection of cache indexing bits for removing conflict misses
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-efficient non-minimal path on-chip interconnection network for heterogeneous systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Static and dynamic co-optimizations for blocks mapping in hybrid caches
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Something old and something new: P-states can borrow microarchitecture techniques too
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A new degree of freedom for memory allocation in clusters
Cluster Computing
CCTR: An efficient point-to-point memory race recorder implemented in chunks
Microprocessors & Microsystems
Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Efficient implementation of globally-aware network flow control
Journal of Parallel and Distributed Computing
APCR: an adaptive physical channel regulator for on-chip interconnects
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
PS-Dir: a scalable two-level directory cache
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Shared hardware data structures for hard real-time systems
Proceedings of the tenth ACM international conference on Embedded software
Efficient traffic aware power management in multicore communications processors
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploiting reuse locality on inclusive shared last-level caches
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A high-efficiency low-cost heterogeneous 3D network-on-chip design
Proceedings of the Fifth International Workshop on Network on Chip Architectures
Bus and memory protection through chain-generated and tree-verified IV for multiprocessors systems
Future Generation Computer Systems
Prototyping hardware support for irregular applications
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Architecture support for custom instructions with memory operations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
High-performance and low-energy buffer mapping method for multiprocessor DSP systems
ACM Transactions on Embedded Computing Systems (TECS)
Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture
Journal of Computer and System Sciences
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Addressing End-to-End Memory Access Latency in NoC-Based Multicores
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
TRANSIT: specifying protocols with concolic snippets
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
An efficient deterministic record-replay with separate dependencies
Computers and Electrical Engineering
CMP off-chip bandwidth scheduling guided by instruction criticality
Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Reuse-based online models for caches
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Proceedings of the ACM International Conference on Computing Frontiers
Orchestrator: a low-cost solution to reduce voltage emergencies for multi-threaded applications
Proceedings of the Conference on Design, Automation and Test in Europe
Overcoming post-silicon validation challenges through quick error detection (QED)
Proceedings of the Conference on Design, Automation and Test in Europe
Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach
Proceedings of the Conference on Design, Automation and Test in Europe
Cache coherence enabled adaptive refresh for volatile STT-RAM
Proceedings of the Conference on Design, Automation and Test in Europe
CPU transparent protection of OS kernel and hypervisor integrity with programmable DRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
A new perspective for efficient virtual-cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Design space exploration of FinFET cache
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
CATRA- congestion aware trapezoid-based routing algorithm for on-chip networks
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An MILP-based aging-aware routing algorithm for NoCs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hardware and software support for fine-grained memory access control and encapsulation in C++
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Optimal placement of vertical connections in 3D Network-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
Memory-centric system interconnect design with hybrid memory cubes
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
McRouter: multicast within a router for high performance network-on-chips
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient multicast schemes for 3-D Networks-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Imbalanced cache partitioning for balanced data-parallel programs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Modeling the impact of permanent faults in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
High-performance fractal coherence
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Techniques to improve performance in requester-wins hardware transactional memory
ACM Transactions on Architecture and Code Optimization (TACO)
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
The case of using multiple streams in streaming
International Journal of Automation and Computing
Thread-criticality aware dynamic cache reconfiguration in multi-core system
Proceedings of the International Conference on Computer-Aided Design
Efficient execution of speculative threads and transactions with hardware transactional memory
Future Generation Computer Systems
Dual partitioning multicasting for high-performance on-chip networks
Journal of Parallel and Distributed Computing
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Bi-LCQ: A low-weight clustering-based Q-learning approach for NoCs
Microprocessors & Microsystems
An effectiveness-based adaptive cache replacement policy
Microprocessors & Microsystems
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols
Journal of Parallel and Distributed Computing
A novel architecture for ahead branch prediction
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers. We leverage an existing full-system functional simulation infrastructure (Simics [14]) as the basis around which to build a set of timing simulator modules for modeling the timing of the memory system and microprocessors. This simulator infrastructure enables us to run architectural experiments using a suite of scaled-down commercial workloads [3]. To enable other researchers to more easily perform such research, we have released these timing simulator modules as the Multifacet General Execution-driven Multiprocessor Simulator (GEMS) Toolset, release 1.0, under GNU GPL [9].