PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
PathFinder: a negotiation-based performance-driven router for FPGAs
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Programming pearls: little languages
Communications of the ACM
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Monitors: an operating system structuring concept
Communications of the ACM
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Component Software: Beyond Object-Oriented Programming
Component Software: Beyond Object-Oriented Programming
Integrating object-oriented programming and protected objects in Ada 95
ACM SIGAda Ada Letters
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
IEEE Parallel & Distributed Technology: Systems & Technology
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 1st conference on Computing frontiers
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the conference on Design, automation and test in Europe
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Proceedings of the 36th annual international symposium on Computer architecture
Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Scalable hardware support for conditional parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
Resource-aware programming and simulation of MPSoC architectures through extension of X10
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
An efficient and flexible task management for many cores
Transactions on High-Performance Embedded Architectures and Compilers IV
FASA: a scalable software framework for distributed control systems
Proceedings of the 3rd international ACM SIGSOFT symposium on Architecting Critical Systems
Hierarchical power management for adaptive tightly-coupled processor arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Hi-index | 0.00 |
Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incen- tive to parallelize a broad range of applications, including those with complex control flow and data structures. And writing par- allel programs is a notoriously difficult task. Beyond processor performance, the architect can help by facilitating the task of the programmer, especially by simplifying the model exposed to the programmer. In this article, among the many issues associated with writing par- allel programs, we focus on finding the appropriate parallelism granularity, and efficiently mapping tasks with complex control and data flow to threads. We propose to relieve the user and com- piler of both tasks by delegating the parallelization decision to the architecture at run-time, through a combination of hardware and software support and a tight dialogue between both. For the software support, we leverage an increasingly popular approach in software engineering, called component-based pro- gramming; the component contract assumes tight encapsulation of code and data for easy manipulation. Previous research works have shown that it is possible to augment components with the ability to split/spawn, providing a simple and fitting approach for programming parallel applications with complex control and data structures. However, such environments still require the program- mer to determine the appropriate granularity of parallelism, and spawning incurs significant overheads due to software run-time system management. For that purpose, we provide an environment with the ability to spawn conditionally depending on available hardware resources, and we delegate spawning decisions and actions to the architec- ture. This conditional spawning is implemented through frequent hardware resource probing by the program. This, in turn, enables rapid adaptation to varying workload conditions, data sets and hardware resources. Furthermore, thanks to appropriate com- bined hardware and compiler support, the probing has no signifi- cant overhead on program performance. We demonstrate this approach on an 8-context SMT, sev- eral non-trivial algorithms and re-engineered SPEC CINT2000 benchmarks, written using component syntax processed by our toolchain. We achieve speedups ranging from 1.1 to 3.0 on our test suite.