Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
IBM Systems Journal
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
How to write parallel programs: a guide to the perplexed
ACM Computing Surveys (CSUR)
The portable common runtime approach to interoperability
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
Mean-Value Analysis of Closed Multichain Queuing Networks
Journal of the ACM (JACM)
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
A Comparison of 12 Parallel FORTRAN Dialects
IEEE Software
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Program Structuring for Effective Parallel Portability
IEEE Transactions on Parallel and Distributed Systems
Distributed Shared Abstractions (DSA) on Multiprocessors
IEEE Transactions on Software Engineering
ICS '97 Proceedings of the 11th international conference on Supercomputing
Dependence driven execution for multiprogrammed multiprocessor
ICS '98 Proceedings of the 12th international conference on Supercomputing
Anonymous Remote Computing: A Paradigm for Parallel Programming on Interconnected Workstations
IEEE Transactions on Software Engineering
Loop re-ordering and pre-fetching at run-time
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Parallel computing is increasingly important in the solution of large-scale numerical problems. The difficulty of efficiently hand-coding parallelism, and the limitations of parallelizing compilers, have nonetheless restricted its use by scientific programmers.In this paper we propose a new paradigm, chores, for the run-time support of parallel computing on shared-memory multiprocessors. We consider specifically uniform memory access shared-memory environments, although the chore paradigm should also be appropriate for use within the clusters of a large-scale nonuniform memory access machine.We argue that chore systems attain both the high efficiency of compiler approaches for the common case of data parallelism, and the flexibility and performance of user-level thread approaches for functional parallelism. These benefits are achieved within a single, simple conceptual model that almost entirely relieves the programmer and compiler from concerns of granularity, scheduling, and enforcement of synchronization constraints. Measurements of a prototype implementation demonstrate that the chore model can be supported more efficiently than can traditional approaches to either data or functional parallelism alone.