Deadlock-free scheduling of X10 computations with bounded resources

Authors:
Shivali Agarwal;Rajkishore Barik;Dan Bonachea;Vivek Sarkar;Rudrapatna K. Shyamasundar;Katherine Yelick
Affiliations:
Tata Institute of Fundamental Research, Mumbai, India;IBM India Research Lab, New Delhi, India;University of California at Berkeley, California and Lawrence Berkeley National Laboratory, California;IBM T.J. Watson Research Center, New York;IBM India Research Lab, New Delhi, India;University of California at Berkeley, California and Lawrence Berkeley National Laboratory, California
Venue:
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2007

Citing 10
Cited 12

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A language with distributed scope

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Executing multithreaded programs efficiently

Executing multithreaded programs efficiently
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
Active Message Applications Programming Interface

Active Message Applications Programming Interface
Titanium Language Reference Manual

Titanium Language Reference Manual
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Proceedings of the 22nd annual international conference on Supercomputing
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Distributed Scheduling of Parallel Hybrid Computations

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic parallelization of recursive code: part 1: managing control flow interactions with the continuator

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Efficient data race detection for async-finish parallelism

RV'10 Proceedings of the First international conference on Runtime verification
Affinity driven distributed scheduling algorithm for parallel computations

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Performance driven distributed scheduling of parallel hybrid computations

Theoretical Computer Science
Performance driven multi-objective distributed scheduling for parallel computations

ACM SIGOPS Operating Systems Review
Evaluating the performance and scalability of mapreduce applications on X10

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Efficient data race detection for async-finish parallelism

Formal Methods in System Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper,we address the problem of guaranteeing the absence of physical deadlock in the execution of a parallel program using the async, finish, atomic, and place constructs from the X10 language. First, we extend previous work-stealing memory bound results for fully strict multi-threaded computations to terminally strict multithreaded computations in which one activity may wait for completion of a descendant activity (as in X10's async and finish constructs), not just an immediate child (as in Cilk 's spawn and sync constructs). This result establishes physical dead-lock freedom for SMP deployments.Second,we introduce a new class of X10 deployments for clusters, which builds on an underlying Active Message network and the new concept of Doppelgänger mode execution of X10 activities. Third, we use this new class of deployments to establish physical deadlock freedom for deployments on clusters of uniprocessors. Together these results give the user the ability to execute a rich set of programs written with async finish atomic and place constructs without worrying about the possibility of physical deadlock due to computation, memory and communication resources. A major open topic for future work is to extend these results to deployments on clusters of SMPs.