Dynamic partitioning in a transputer environment
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduler activations: effective kernel support for the user-level management of parallelism
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Lazy threads: implementing a fast parallel call
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Parallel programming in OpenMP
Parallel programming in OpenMP
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Threads for Interoperable Parallel Programming
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Cooperative Task Management Without Manual Stack Management
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Lightweight concurrency primitives for GHC
Haskell '07 Proceedings of the ACM SIGPLAN workshop on Haskell workshop
30 seconds is not enough!: a study of operating system timer usage
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Intel threading building blocks
Intel threading building blocks
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scalable HMM based inference engine in large vocabulary continuous speech recognition
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Tessellation: space-time partitioning in a manycore client OS
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Holistic run-time parallelism management for time and energy efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation
Proceedings of the 50th Annual Design Automation Conference
Efficient multiprogramming for multicores with SCAF
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and implementation of Lithe, a low-level substrate that provides the basic primitives and a standard interface for composing parallel codes efficiently. Lithe can be inserted underneath the runtimes of legacy parallel libraries to provide bolt-on composability without needing to change existing application code. Lithe can also serve as the foundation for building new parallel abstractions and libraries that automatically interoperate with one another. In this paper, we show versions of Threading Building Blocks (TBB) and OpenMP perform competitively with their original implementations when ported to Lithe. Furthermore, for two applications composed of multiple parallel libraries, we show that leveraging our substrate outperforms their original, even expertly tuned, implementations.