Evaluation of the SPUR Lisp architecture
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Computer
801 storage: architecture and programming
ACM Transactions on Computer Systems (TOCS)
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Virtual memory primitives for user programs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor
Computer
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
ICS '90 Proceedings of the 4th international conference on Supercomputing
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Kernel Support for the Wisconsin Wind Tunnel
USENIX Microkernels and Other Kernel Architectures Symposium
Universal Mechanisms for Concurrency
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
LCM: memory system support for parallel language implementation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
IBM Systems Journal
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Remote queues: exposing message queues for optimization and atomicity
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
StormWatch: a tool for visualizing memory system protocols
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A compiler-directed distributed shared memory system
ICS '95 Proceedings of the 9th international conference on Supercomputing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Teapot: language support for writing memory coherence protocols
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The GLOW cache coherence protocol extensions for widely shared data
ICS '96 Proceedings of the 10th international conference on Supercomputing
A cost-comparison approach for adaptive distributed shared memory
ICS '96 Proceedings of the 10th international conference on Supercomputing
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Adapting the Network Interface for High-Performance Computing: The CNI Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Protocol-based data-race detection
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Formal verification of complex coherence protocols using symbolic state models
Journal of the ACM (JACM)
Pc-based Shared Memory Architecture and Language
The Journal of Supercomputing
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tapeworm: high-level abstractions of shared accesses
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs
ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
IBM Systems Journal
Teapot: A Domain-Specific Language for Writing Cache Coherence Protocols
IEEE Transactions on Software Engineering
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
A high-level abstraction of shared accesses
ACM Transactions on Computer Systems (TOCS)
Software-Based Rerouting for Fault-Tolerant Pipelined Communication
IEEE Transactions on Parallel and Distributed Systems
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
Compiler-directed shared-memory communication for iterative parallel applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A synthesis of memory mechanisms for distributed architectures
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimistic active messages: structuring systems for high-performance communication
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
StarT-Voyager: a flexible platform for exploring scalable SMP issues
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The effects of communication parameters on end performance of shared virtual memory clusters
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Experiment management support for performance tuning
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Cost effectiveness of an adaptable computing cluster
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Distributed Shared Memory: Concepts and Systems
IEEE Parallel & Distributed Technology: Systems & Technology
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
Two Layers Distributed Shared Memory
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Exploiting the Capabilities of Communications Co-Processors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Aurora: Scoped Behavior for Per-Context Optimized Distributed Data Sharing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Message Passing Vs. Shared Address Space on a Clusters of SMPs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Processor Mechanisms for Software Shared Memory
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Telegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
CNI: A High-Performance Network Interface for Workstation Clusters
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Locking with Different Granularities for Reads and Writes in an MVM System
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Journal of Parallel and Distributed Computing
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
An Analysis of the Cost Effectiveness of an Adaptable Computing Cluster
Cluster Computing
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Journal of Systems Architecture: the EUROMICRO Journal
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
HeapMon: a helper-thread approach to programmable, automatic, and low-overhead memory bug detection
IBM Journal of Research and Development
Software write detection for a distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Experience with a language for writing coherence protocols
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Proceedings of the 21st annual international conference on Supercomputing
The case for simple, visible cache coherency
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Compiling for a hybrid programming model using the LMAD representation
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Multi-view memory to support OS locking for transaction systems
IDEAS'97 Proceedings of the 1997 international conference on International database engineering and applications symposium
A case for RDMA in clouds: turning supercomputer networking into commodity
Proceedings of the Second Asia-Pacific Workshop on Systems
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.01 |
Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today's machines limit these programs to a single communication paradigm, either message-passing or shared-memory, which results in uneven performance. This paper addresses this problem by defining an interface, Tempest, that exposes low-level communication and memory-system mechanisms so programmers and compilers can customize policies for a given application. Typhoon is a proposed hardware platform that implements these mechanisms with a fully-programmable, user-level processor in the network interface. We demonstrate the utility of Tempest with two examples. First, the Stache protocol uses Tempest's finegrain access control mechanisms to manage part of a processor's local memory as a large, fully-associative cache for remote data. We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware DirNNB cache-coherence protocol for five shared-memory programs. Second, we illustrate how programmers or compilers can use Tempest's flexibility to exploit an application's sharing patterns with a custom protocol. For the EM3D application, the custom protocol improves performance up to 35% over the all-hardware protocol.