SPLASH: Stanford parallel applications for shared-memory

Authors:
Jaswinder Pal Singh;Wolf-Dietrich Weber;Anoop Gupta
Affiliations:
-;-;-
Venue:
ACM SIGARCH Computer Architecture News
Year:
1992

Citing 12
Cited 240

A fast algorithm for particle simulations

Journal of Computational Physics
Sparse matrix test problems

ACM Transactions on Mathematical Software (TOMS)
The parallel decomposition and implementation of an integrated circuit global router

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Finding and exploiting parallelism in an ocean simulation program: experience, results, and implications

Journal of Parallel and Distributed Computing
Parallel hierarchical N-body methods

Parallel hierarchical N-body methods
LocusRoute: a parallel global router for standard cells

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Asynchronous distributed simulation via a sequence of parallel computations

Communications of the ACM - Special issue on simulation modeling and statistical computing
Numerical Initial Value Problems in Ordinary Differential Equations

Numerical Initial Value Problems in Ordinary Differential Equations
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation

Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation
Tango: A Multiprocessor Simulation and Tracing System

Tango: A Multiprocessor Simulation and Tracing System

Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Data locality and load balancing in COOL

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cooperative shared memory: software and hardware for scalable multiprocessors

ACM Transactions on Computer Systems (TOCS)
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Mechanisms for cooperative shared memory

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The performance of cache-coherent ring-based multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic control of performance monitoring on large scale parallel systems

ICS '93 Proceedings of the 7th international conference on Supercomputing
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The influence of random delays on parallel execution times

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Effectiveness of trace sampling for performance debugging tools

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Developing parallel applications using high-performance simulation

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An evaluation of directory protocols for medium-scale shared-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols

ICS '94 Proceedings of the 8th international conference on Supercomputing
Cost/performance of a parallel computer simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Evaluating the memory overhead required for COMA architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Combined performance gains of simple cache protocol extensions

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Future applicability of bus-based shared memory multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient shared memory with minimal hardware support

ACM SIGARCH Computer Architecture News
SM-prof: a tool to visualise and find cache coherence performance bottlenecks in multiprocessor programs

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A new page table for 64-bit address spaces

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Message passing versus distributed shared memory on networks of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The benefits of clustering in shared address space multiprocessors: an applications-driven investigation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Communication optimizations for parallel computing using data access information

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing

ICS '95 Proceedings of the 9th international conference on Supercomputing
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Evaluating the impact of advanced memory systems on compiler-parallelized codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols

ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Evaluating virtual channels for cache-coherent shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Conservative circuit simulation on shared-memory multiprocessors

PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Optimistic simulation of parallel architectures using program executables

PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Characterizing the Memory Behavior of Compiler-Parallelized Applications

IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Compiler support for decoupled virtual shared memory systems

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Parallel timing simulation on a distributed memory multiprocessor

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Performance evaluation of message-driven parallel VLSI CAD applications on general purpose multiprocessors

ICS '97 Proceedings of the 11th international conference on Supercomputing
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ace: linguistic mechanisms for customizable protocols

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs between false sharing and aggregation in software distributed shared memory

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The interaction of software prefetching with ILP processors in shared-memory systems

Proceedings of the 24th annual international symposium on Computer architecture
VM-based shared memory on low-latency, remote-memory-access networks

Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analytical Prediction of Performance for Cache Coherence Protocols

IEEE Transactions on Computers
Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Design Verification of the S3.mp Cache-Coherent Shared-Memory System

IEEE Transactions on Computers
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors

Proceedings of the 25th annual international symposium on Computer architecture
Protocol-based data-race detection

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Predicting the performance of distributed virtual shared-memory applications

IBM Systems Journal
The Stanford FLASH multiprocessor

25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance

25 years of the international symposia on Computer architecture (selected papers)
Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Hardware Support for Flexible Distributed Shared Memory

IEEE Transactions on Computers
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
The pool of subsectors cache design

ICS '99 Proceedings of the 13th international conference on Supercomputing
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support

ICS '99 Proceedings of the 13th international conference on Supercomputing
Low-level router design and its impact on supercomputer system performance

ICS '99 Proceedings of the 13th international conference on Supercomputing
Data management in networks: experimental evaluation of a provably good strategy

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Ace: a language for parallel programming with customizable protocols

ACM Transactions on Computer Systems (TOCS)
Task Spreading and Shrinking on Multiprocessor Systems and Networks of Workstations

IEEE Transactions on Parallel and Distributed Systems
Design Alternatives of Multithreaded Architecture

International Journal of Parallel Programming
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks

IEEE Transactions on Parallel and Distributed Systems
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors

IEEE Transactions on Computers
Memory Hierarchy Considerations for Cost-Effective Cluster Computing

IEEE Transactions on Computers
Compiler-directed shared-memory communication for iterative parallel applications

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Evaluating the performance limitations of MPMD communication

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Run-time support for distributed sharing in safe languages

ACM Transactions on Computer Systems (TOCS)
Run-time adaptation in river

ACM Transactions on Computer Systems (TOCS)
An Effective Logical Cache for a Clustered LRC-Based DSM System

Cluster Computing
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Paging tradeoffs in distributed-shared-memory multiprocessors

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance Tuning Software DSM Applications using Visualisation

The Journal of Supercomputing
Transactional lock-free execution of lock-based programs

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and analysis of static memory management policies for CC-NUMA Multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
Using the Cowichan Problems to Assess the Usability of Orca

IEEE Parallel & Distributed Technology: Systems & Technology
Fault Tolerance and Configurability in DSM Coherence Protocols

IEEE Concurrency
Benchmarking with Real Industrial Applications: The SPEC High-Performance Group

IEEE Computational Science & Engineering
COOL: An Object-Based Language for Parallel Programming

Computer
Tuning Memory Performance of Sequential and Parallel Programs

Computer
Application Performance on the MIT Alewife Machine

Computer
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Performance Evaluation of the Slotted Ring Multiprocessor

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps

ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Application-Controlled Coherence Protocols for Scope Consistent Software DSMs

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Virtual Memory Model for Parallel Supercomputers

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Maximizing Speedup through Self-Tuning of Processor Allocation

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Implementation of Scalable Blocking Locks Using an Adaptive Thread Scheduler

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance of On-Chip Multiprocessors for Vision Tasks

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Priority Based Messaging for Software Distributed Shared Memory

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Critique of ESP

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Node Count-Independent Logical Clock for Scaling Lazy Release Consistency Protocol

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Graph-Oriented Task Manager for Small Multiprocessor Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Run-Time Support for Distributed Sharing in Typed Languages

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A New Parallelism Management Scheme for Multiprocessor Systems

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Multiple-writer entry consistency

Cluster computing
On cache memory hierarchy for Chip-Multiprocessor

ACM SIGARCH Computer Architecture News
An efficient causal logging scheme for recoverable distributed shared memory systems

Parallel Computing
Compactly representing parallel program executions

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Inferential queueing and speculative push for reducing critical communication latencies

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Shared memory computing on SP2: JIAJIA approach

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Supporting multiple parallel programming paradigms on top of the Millipede virtual parallel machine

HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software cache coherence for large scale multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Bus-based COMA-reducing traffic in shared-bus multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The impact of shared-cache clustering in small-scale shared-memory multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Hierarchical Memory Directory Scheme Via Extending SCI for Large-Scale Multiprocessors

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A model for parallel simulation of distributed shared memory

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Object Consistency - A New Model for Distributed Memory Systems

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Virtual memory on data diffusion architectures

Parallel Computing
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Synthesis of Partitioned Shared Memory Architectures for Energy-Efficient Multi-Processor SoC

Proceedings of the conference on Design, automation and test in Europe - Volume 1
A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Packetization and routing analysis of on-chip multiprocessor networks

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
FIFO power optimization for on-chip networks

Proceedings of the 14th ACM Great Lakes symposium on VLSI
An Effective Methodology to Improve the Performance of the Up*/Down* Routing Algorithm

IEEE Transactions on Parallel and Distributed Systems
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration

IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs: Snoop-Based Cache Coherence vs. Software Solutions

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The diffusion space of data diffusion architectures

Parallel Computing
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Shared memory computing on clusters with symmetric multiprocessors and system area networks

ACM Transactions on Computer Systems (TOCS)
Characterization of TCC on Chip-Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Architectural Semantics for Practical Transactional Memory

Proceedings of the 33rd annual international symposium on Computer Architecture
Running OpenMP applications efficiently on an everything-shared SDSM

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Cache coherence tradeoffs in shared-memory MPSoCs

ACM Transactions on Embedded Computing Systems (TECS)
Synchronization-driven dynamic speed scaling for MPSoCs

Proceedings of the 2006 international symposium on Low power electronics and design
Tradeoffs in transactional memory virtualization

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks

Proceedings of the 20th annual international conference on Supercomputing
Architectural implications of brick and mortar silicon manufacturing

Proceedings of the 34th annual international symposium on Computer architecture
An Integrated Methodology for the Verification of Directory-Based Cache Protocols

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Reducing the Write Traffic for a Hybrid Cache Protocol

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Message-driven relaxed consistency in a software distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Windows NT in a ccNUMA system

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
False sharing and its effect on shared memory performance

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving the accuracy of snoop filtering using stream registers

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving the performance of large interconnection networks using congestion-control mechanisms

Performance Evaluation
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

The Journal of Supercomputing
Distributed and low-power synchronization architecture for embedded multiprocessors

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Communication Based Proactive Link Power Management

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Frequent value compression in packet-based NoC architectures

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Energy-optimal synchronization primitives for single-chip multi-processors

Proceedings of the 19th ACM Great Lakes symposium on VLSI
Design and implementation of an agent home scheme strategy for prefetch-based DSM systems

International Journal of Parallel Programming
Evaluating SPLASH-2 Applications Using MapReduce

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Experience with building a commodity intel-based ccNUMA system

IBM Journal of Research and Development
Energy reduction for STT-RAM using early write termination

Proceedings of the 2009 International Conference on Computer-Aided Design
An extensible infrastructure for benchmarking multi-core processors based systems

SPECTS'09 Proceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems
A transformation to provide deadlock-free programs

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
SCSTallocator: sized and call-site tracing-based shared memory allocator for false sharing reduction in page-based DSM systems

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A new congestion control algorithm for improving the performance of a broadcast-based multiprocessor architecture

Journal of Parallel and Distributed Computing
Adaptive multi-threading for dynamic workloads in embedded multiprocessors

SBCCI '10 Proceedings of the 23rd symposium on Integrated circuits and system design
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks

Microprocessors & Microsystems
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips

Proceedings of the 48th Design Automation Conference
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing the Barnes-Hut algorithm in UPC

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
CSTallocator: call-site tracing based shared memory allocator for false sharing reduction in page-based DSM systems

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
A new hybrid directory scheme for shared memory multi-processors

CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Communication based proactive link power management

Transactions on High-Performance Embedded Architectures and Compilers IV
BaLinda Lisp: Design and implementation

Computer Languages
Optimizing floating point operations in Scheme

Computer Languages
Research: Characterizing and scheduling communication interactions of parallel and local jobs on networks of workstations

Computer Communications
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
Efficient multicast schemes for 3-D Networks-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing

Quantified Score

Hi-index	0.03

Visualization

Abstract

We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss some of their important characteristics, and explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time.