An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Modeling, Verification, and Exploration of Task-Level Concurrency of Real-Time Embedded Systems
Modeling, Verification, and Exploration of Task-Level Concurrency of Real-Time Embedded Systems
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
CRISP: A Template for Reconfigurable Instruction Set Processors
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Synthesis of customized loop caches for core-based embedded systems
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
DLP +TLP Processors for the Next Generation of Media Workloads
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Recursive Filtering on a Vector DSP with Linear Speedup
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Energy efficient support for all levels of parallelism for complex media applications
Energy efficient support for all levels of parallelism for complex media applications
Software defined radio – a high performance embedded challenge
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
The search for energy efficiency in the design of embedded systems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embedded systems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads. We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. The technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes. We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.