A fast algorithm for particle simulations
Journal of Computational Physics
ACM Transactions on Mathematical Software (TOMS)
The parallel decomposition and implementation of an integrated circuit global router
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Parallel hierarchical N-body methods
Parallel hierarchical N-body methods
LocusRoute: a parallel global router for standard cells
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Asynchronous distributed simulation via a sequence of parallel computations
Communications of the ACM - Special issue on simulation modeling and statistical computing
Numerical Initial Value Problems in Ordinary Differential Equations
Numerical Initial Value Problems in Ordinary Differential Equations
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation
Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation
Tango: A Multiprocessor Simulation and Tracing System
Tango: A Multiprocessor Simulation and Tracing System
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Mechanisms for cooperative shared memory
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The performance of cache-coherent ring-based multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The influence of random delays on parallel execution times
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Developing parallel applications using high-performance simulation
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An evaluation of directory protocols for medium-scale shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
Cost/performance of a parallel computer simulator
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Evaluating the memory overhead required for COMA architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Future applicability of bus-based shared memory multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A new page table for 64-bit address spaces
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Message passing versus distributed shared memory on networks of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Communication optimizations for parallel computing using data access information
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing
ICS '95 Proceedings of the 9th international conference on Supercomputing
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols
ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Evaluating virtual channels for cache-coherent shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
Conservative circuit simulation on shared-memory multiprocessors
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Optimistic simulation of parallel architectures using program executables
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Compiler support for decoupled virtual shared memory systems
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Parallel timing simulation on a distributed memory multiprocessor
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
ICS '97 Proceedings of the 11th international conference on Supercomputing
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The interaction of software prefetching with ILP processors in shared-memory systems
Proceedings of the 24th annual international symposium on Computer architecture
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Commutativity analysis: a new analysis technique for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Design Verification of the S3.mp Cache-Coherent Shared-Memory System
IEEE Transactions on Computers
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Protocol-based data-race detection
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Predicting the performance of distributed virtual shared-memory applications
IBM Systems Journal
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
IEEE Transactions on Computers
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Eliminating synchronization bottlenecks in object-based programs using adaptive replication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support
ICS '99 Proceedings of the 13th international conference on Supercomputing
Low-level router design and its impact on supercomputer system performance
ICS '99 Proceedings of the 13th international conference on Supercomputing
Data management in networks: experimental evaluation of a provably good strategy
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
Task Spreading and Shrinking on Multiprocessor Systems and Networks of Workstations
IEEE Transactions on Parallel and Distributed Systems
Design Alternatives of Multithreaded Architecture
International Journal of Parallel Programming
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
Memory Hierarchy Considerations for Cost-Effective Cluster Computing
IEEE Transactions on Computers
Compiler-directed shared-memory communication for iterative parallel applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Evaluating the performance limitations of MPMD communication
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Run-time support for distributed sharing in safe languages
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
An Effective Logical Cache for a Clustered LRC-Based DSM System
Cluster Computing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Paging tradeoffs in distributed-shared-memory multiprocessors
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance Tuning Software DSM Applications using Visualisation
The Journal of Supercomputing
Transactional lock-free execution of lock-based programs
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Using the Cowichan Problems to Assess the Usability of Orca
IEEE Parallel & Distributed Technology: Systems & Technology
Fault Tolerance and Configurability in DSM Coherence Protocols
IEEE Concurrency
Benchmarking with Real Industrial Applications: The SPEC High-Performance Group
IEEE Computational Science & Engineering
Performance Evaluation of the Slotted Ring Multiprocessor
IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization bottlenecks using adaptive replication
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps
ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Application-Controlled Coherence Protocols for Scope Consistent Software DSMs
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Virtual Memory Model for Parallel Supercomputers
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Maximizing Speedup through Self-Tuning of Processor Allocation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Implementation of Scalable Blocking Locks Using an Adaptive Thread Scheduler
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance of On-Chip Multiprocessors for Vision Tasks
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Priority Based Messaging for Software Distributed Shared Memory
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Node Count-Independent Logical Clock for Scaling Lazy Release Consistency Protocol
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Graph-Oriented Task Manager for Small Multiprocessor Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Run-Time Support for Distributed Sharing in Typed Languages
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A New Parallelism Management Scheme for Multiprocessor Systems
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Multiple-writer entry consistency
Cluster computing
On cache memory hierarchy for Chip-Multiprocessor
ACM SIGARCH Computer Architecture News
Compactly representing parallel program executions
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Shared memory computing on SP2: JIAJIA approach
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Supporting multiple parallel programming paradigms on top of the Millipede virtual parallel machine
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Bus-based COMA-reducing traffic in shared-bus multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Hierarchical Memory Directory Scheme Via Extending SCI for Large-Scale Multiprocessors
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A model for parallel simulation of distributed shared memory
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Object Consistency - A New Model for Distributed Memory Systems
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Virtual memory on data diffusion architectures
Parallel Computing
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Synthesis of Partitioned Shared Memory Architectures for Energy-Efficient Multi-Processor SoC
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Packetization and routing analysis of on-chip multiprocessor networks
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
FIFO power optimization for on-chip networks
Proceedings of the 14th ACM Great Lakes symposium on VLSI
An Effective Methodology to Improve the Performance of the Up*/Down* Routing Algorithm
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The diffusion space of data diffusion architectures
Parallel Computing
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Architectural Semantics for Practical Transactional Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Running OpenMP applications efficiently on an everything-shared SDSM
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
Synchronization-driven dynamic speed scaling for MPSoCs
Proceedings of the 2006 international symposium on Low power electronics and design
Tradeoffs in transactional memory virtualization
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
Proceedings of the 20th annual international conference on Supercomputing
Architectural implications of brick and mortar silicon manufacturing
Proceedings of the 34th annual international symposium on Computer architecture
An Integrated Methodology for the Verification of Directory-Based Cache Protocols
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Reducing the Write Traffic for a Hybrid Cache Protocol
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Message-driven relaxed consistency in a software distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Software write detection for a distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
The Journal of Supercomputing
Distributed and low-power synchronization architecture for embedded multiprocessors
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Communication Based Proactive Link Power Management
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Frequent value compression in packet-based NoC architectures
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Energy-optimal synchronization primitives for single-chip multi-processors
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Design and implementation of an agent home scheme strategy for prefetch-based DSM systems
International Journal of Parallel Programming
Evaluating SPLASH-2 Applications Using MapReduce
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
Energy reduction for STT-RAM using early write termination
Proceedings of the 2009 International Conference on Computer-Aided Design
An extensible infrastructure for benchmarking multi-core processors based systems
SPECTS'09 Proceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems
A transformation to provide deadlock-free programs
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Journal of Parallel and Distributed Computing
Adaptive multi-threading for dynamic workloads in embedded multiprocessors
SBCCI '10 Proceedings of the 23rd symposium on Integrated circuits and system design
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Microprocessors & Microsystems
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
Proceedings of the 48th Design Automation Conference
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing the Barnes-Hut algorithm in UPC
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
A new hybrid directory scheme for shared memory multi-processors
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Communication based proactive link power management
Transactions on High-Performance Embedded Architectures and Compilers IV
BaLinda Lisp: Design and implementation
Computer Languages
Optimizing floating point operations in Scheme
Computer Languages
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
Efficient multicast schemes for 3-D Networks-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 0.03 |
We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss some of their important characteristics, and explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time.