Rapid, low-power loop execution in a network of functional units

Authors:
Athanassios Tziouvaras;Georgios Dimitriou
Affiliations:
University of Thessaly;University of Thessaly
Venue:
Proceedings of the 17th Panhellenic Conference on Informatics
Year:
2013

Citing 13
Cited 0

An effective BIST architecture for fast multiplier cores

DATE '99 Proceedings of the conference on Design, automation and test in Europe
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
A loop accelerator for low power embedded VLIW processors

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Novel architecture for loop acceleration: a case study

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Vector processing as a soft-core CPU accelerator

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An evaluation of the TRIPS computer system

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Evolution of thread-level parallelism in desktop applications

Proceedings of the 37th annual international symposium on Computer architecture
Erasing Core Boundaries for Robust and Configurable Performance

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for high-performance computing and low-power operation has led to the emergence of new processor architectures, with most recent designs based on the combination of multiple cores and multiple threads per core. In our work, we are exploring an architecture of multiple instruction pipelines, which merge into a common back-end, formed as a network of functional units. We focus on the back-end in this paper, and in particular, on a rapid, low-power execution of loops, based on data flow. We dispatch the loop body instructions on the network of functional units only once, and we then let the loop execute in a dataflow manner, without any other instruction issue before loop completion. In this way, we do not only speed up the loop execution but we also save energy, since during the execution of the loop the whole front end of the pipeline is not used and can be turned off. We have simulated the functional unit network on microarchitecture level, running a number of Livermore loops. The results we obtained show that the proposed architecture can accelerate loop execution by up to N/k, for a network of N units and loop body size of N instructions, and an issue rate of k instructions per cycle.