Chores: enhanced run-time support for shared-memory parallel computing

Authors:
Derek L. Eager;John Jahorjan
Affiliations:
Univ. of Saskatchewan, Saskatoon, Sask., Canada;Univ. of Washington, Seattle
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1993

Citing 18
Cited 10

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Firefly: A Multiprocessor Workstation

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
IBM parallel FORTRAN

IBM Systems Journal
Workcrews: an abstraction for controlling parallelism

International Journal of Parallel Programming
How to write parallel programs: a guide to the perplexed

ACM Computing Surveys (CSUR)
The portable common runtime approach to interoperability

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Do Parallel Languages Respond to the Needs of Scientific Programmers?

Computer
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
Mean-Value Analysis of Closed Multichain Queuing Networks

Journal of the ACM (JACM)
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
A Comparison of 12 Parallel FORTRAN Dialects

IEEE Software
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Program Structuring for Effective Parallel Portability

IEEE Transactions on Parallel and Distributed Systems

Distributed Shared Abstractions (DSA) on Multiprocessors

IEEE Transactions on Software Engineering
Performance evaluation of message-driven parallel VLSI CAD applications on general purpose multiprocessors

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dependence driven execution for multiprogrammed multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Anonymous Remote Computing: A Paradigm for Parallel Programming on Interconnected Workstations

IEEE Transactions on Software Engineering
Loop re-ordering and pre-fetching at run-time

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel performance prediction using lost cycles analysis

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
A portable runtime interface for multi-level memory hierarchies

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel computing is increasingly important in the solution of large-scale numerical problems. The difficulty of efficiently hand-coding parallelism, and the limitations of parallelizing compilers, have nonetheless restricted its use by scientific programmers.In this paper we propose a new paradigm, chores, for the run-time support of parallel computing on shared-memory multiprocessors. We consider specifically uniform memory access shared-memory environments, although the chore paradigm should also be appropriate for use within the clusters of a large-scale nonuniform memory access machine.We argue that chore systems attain both the high efficiency of compiler approaches for the common case of data parallelism, and the flexibility and performance of user-level thread approaches for functional parallelism. These benefits are achieved within a single, simple conceptual model that almost entirely relieves the programmer and compiler from concerns of granularity, scheduling, and enforcement of synchronization constraints. Measurements of a prototype implementation demonstrate that the chore model can be supported more efficiently than can traditional approaches to either data or functional parallelism alone.