Composing parallel software efficiently with lithe

Authors:
Heidi Pan;Benjamin Hindman;Krste Asanović
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Year:
2010

Citing 23
Cited 9

Dynamic partitioning in a transputer environment

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
CPU inheritance scheduling

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Lazy threads: implementing a fast parallel call

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Parallel programming in OpenMP

Parallel programming in OpenMP
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Threads for Interoperable Parallel Programming

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Cooperative Task Management Without Manual Stack Management

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems

Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Lightweight concurrency primitives for GHC

Haskell '07 Proceedings of the ACM SIGPLAN workshop on Haskell workshop
30 seconds is not enough!: a study of operating system timer usage

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Multi-threading and one-sided communication in parallel LU factorization

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Intel threading building blocks

Intel threading building blocks
Revisiting coroutines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scalable HMM based inference engine in large vocabulary continuous speech recognition

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Tessellation: space-time partitioning in a manycore client OS

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Optimizing collective communication on multicores

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism

Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
MyUT: Design and implementation of efficient user-level thread management for improving cache utilization

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
High-level support for pipeline parallelism on many-core architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Holistic run-time parallelism management for time and energy efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation

Proceedings of the 50th Annual Design Automation Conference
Efficient multiprogramming for multicores with SCAF

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and implementation of Lithe, a low-level substrate that provides the basic primitives and a standard interface for composing parallel codes efficiently. Lithe can be inserted underneath the runtimes of legacy parallel libraries to provide bolt-on composability without needing to change existing application code. Lithe can also serve as the foundation for building new parallel abstractions and libraries that automatically interoperate with one another. In this paper, we show versions of Threading Building Blocks (TBB) and OpenMP perform competitively with their original implementations when ported to Lithe. Furthermore, for two applications composed of multiple parallel libraries, we show that leveraging our substrate outperforms their original, even expertly tuned, implementations.