Cedar—a large scale multiprocessor
Advanced computer architecture
The NYU ultracomputer—designing a MIMD, shared-memory parallel machine
25 years of the international symposia on Computer architecture (selected papers)
Memory access buffering in multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory Model = Instruction Reordering + Store Atomicity
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Foundations of the C++ concurrency memory model
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Toward an Automatic Code Layout Methodology
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Complete formal specification of the OpenMP memory model
International Journal of Parallel Programming
Hi-index | 0.00 |
This paper is motivated by the desire to provide an efficient and scalable software cache implementation of OpenMP on multicore and manycore architectures in general, and on the IBM CELL architecture in particular. In this paper, we propose an instantiation of the OpenMP memory model with the following advantages: (1) The proposed instantiation prohibits undefined values that may cause problems of safety, security, programming and debugging. (2) The proposed instantiation is scalable with respect to the number of threads because it does not rely on communication among threads or a centralized directory that maintains consistency of multiple copies of each shared variable. (3) The proposed instantiation avoids the ambiguity of the original memory model definition proposed on the OpenMP Specification 3.0. We also introduce a new cache protocol for this instantiation, which can be implemented as a software-controlled cache. Experimental results on the Cell Broadband Engine show that our instantiation results in nearly linear speedup with respect to the number of threads for a number of NAS Parallel Benchmarks. The results also show a clear advantage when comparing it to a software cache design derived from a stronger memory model that maintains a global total ordering among flush operations.