Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
VLSI Support for a cactus stack oriented memory organization
Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
The control mechanism for the Myrias parallel computer system
ACM SIGARCH Computer Architecture News - Special Issue: Architectural Support for Operating Systems
The Amber system: parallel programming on a network of multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
LCM: memory system support for parallel language implementation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The cilk system for parallel multithreaded computing
The cilk system for parallel multithreaded computing
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Operating Systems Theory
The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem
The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Software write detection for a distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
The design and evaluation of a shared object system for distributed memory machines
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Burroughs' B6500/B7500 stack mechanism
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Fine-grain multithreading with the EM-X multiprocessor
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Dag consistent parallel simulation: a predictable and robust conservative algorithm
Proceedings of the eleventh workshop on Parallel and distributed simulation
Computation-centric memory models
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Improving the Java memory model using CRF
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
ATLAS: an infrastructure for global computing
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Y-Invalidate: A New Protocol for Implementing Weak Consistency in DSM Systems
International Journal of Parallel Programming
Supporting multiple parallel programming paradigms on top of the Millipede virtual parallel machine
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
A unified theory of shared memory consistency
Journal of the ACM (JACM)
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Brief announcement: low depth cache-oblivious sorting
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Brief announcement: communication bounds for heterogeneous architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Scheduling irregular parallel computations on hierarchical caches
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-consistent distributed cactus stack for storage allocation. We provide empirical evidence of the flexibility and efficiency of dag consistency for applications that include blocked matrix multiplication, Strassen's matrix multiplication algorithm, and a Barnes-Hut code. Although Cilk schedules the executions of these programs dynamically, their performances are competitive with statically scheduled implementations in the literature. We also prove that the number F_P of page faults incurred by a user program running on P processors can be related to the number F_1 of page faults running serially by the formula F_P \leq F_1 + 2Cs, where C is the cache size and s is the number of thread migrations executed by Cilk's scheduler.