Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors
IEEE Transactions on Computers
The scalability of multigrain systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Software Techniques for Improving MPP Bulk-Transfer Performance
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Scheduling of Algorithms Based on Elimination Trees on NUMA Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Analysis of Shared Memory Misses and Reference Patterns
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Bounding energy consumption in large-scale MPI programs
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
FastFwd: an efficient hardware acceleration technique for trace-driven network-on-chip simulation
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MARSS: a full system simulator for multicore x86 CPUs
Proceedings of the 48th Design Automation Conference
Types, regions, and effects for safe programming with object-oriented parallel frameworks
Proceedings of the 25th European conference on Object-oriented programming
Exploiting temporal decoupling to accelerate trace-driven NoC emulation
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
RaceMob: crowdsourced data race detection
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss and compare some of their important characteristicsPsuch as data locality, granularity, synchronization, etc.Pand explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time. This report replaces and updates CSL-TR-91-469, April 1991.