Data flow equations for explicitly parallel programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Static single assignment for explicitly parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
Micro-Threading: A New Approach to Future RISC
ACAC '00 Proceedings of the 5th Australasian Computer Architecture Conference
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Out-of-Order Execution may not be Cost-Effective on Processors Featuring Simultaneous Multithreading
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Compilation techniques for explicitly parallel programs
Compilation techniques for explicitly parallel programs
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
We investigate extremely fine-grain multithreading as a means for improving energy efficiency of single-task program execution.Our work is based on low-overhead threads executing an explicitly parallel program in a register-sharing context. The thread-based parallelism takes the place of instruction-level parallelism, allowing us to use simple and more energy-efficient in-order pipelines while retaining performance that is characteristic of classical out-of-order processors. Our evaluation shows that in energy terms, the parallelized code running over in-order pipelines can outperform both plain in-order and out-of-order processors.