Computer
Effect of storage allocation/reclamation methods on parallelism and storage requirements
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An open enviornment for building parallel programming systems
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
The horizon supercomputing system: architecture and software
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A future-based parallel language for a general-purpose highly-parallel computer
Selected papers of the second workshop on Languages and compilers for parallel computing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
StaCS: a Static Control Superscalar architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing
IEEE Transactions on Parallel and Distributed Systems
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Evaluating titanium SPMD programs on the Tera MTA
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Early Experience with Scientific Programs on the Cray MTA-2
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
HP scalable computing architecture
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
MIPS MT: a multithreaded RISC architecture for embedded real-time processing
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Understanding throughput-oriented architectures
Communications of the ACM
Scheduling task parallelism on multi-socket multicore systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Exploring irregular memory accesses on FPGAs
Proceedings of the first workshop on Irregular applications: architectures and algorithm
OpenMP task scheduling strategies for multicore NUMA systems
International Journal of High Performance Computing Applications
Parallel solution of the subset-sum problem: an empirical study
Concurrency and Computation: Practice & Experience
Compiled multithreaded data paths on FPGAs for dynamic workloads
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.02 |
This paper describes an integrated architecture, compiler, runtime, and operating system solution to exploiting heterogeneous parallelism. The architecture is a pipelined multi-threaded multiprocessor, enabling the execution of very fine (multiple operations within an instruction) to very coarse (multiple jobs) parallel activities. The compiler and runtime focus on managing parallelism within a job, while the operating system focuses on managing parallelism across jobs. By considering the entire system in the design, we were able to smoothly interface its four components. While each component is primarily responsible for managing its own level of parallel activity, feedback mechanisms between components enable resource allocation and usage to be dynamically updated. This dynamic adaptation to changing requirements and available resources fosters both high utilization of the machine and the efficient expression and execution of parallelism.