Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
We consider the task-based execution of parallel irregular applications, which are characterized by an unpredictable computational structure induced by the input data. The dynamic load balancing required to execute such applications efficiently can be provided by task pools. Thus, the performance of a task-based irregular application is tightly coupled to the scalability and the overhead of the task pool used to execute it. In order to reduce this overhead this article considers the use of the hardware-specific synchronization operations Compare & Swap and Load & Reserve/Store Conditional. We present several different realizations of task pools using these operations. Runtime experiments on two shared-memory machines, a SunFire 6800 and an IBM p690, show that the new implementations obtain a significantly higher performance than implementations relying on the POSIX thread library for synchronization.