Empowering a helper cluster through data-width aware instruction selection policies

Authors:
Osman S. Unsal;Oguz Ergin;Xavier Vera;Antonio González
Affiliations:
Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain;Department of Computer Engineering, TOBB Univ. of Economics and Technology, Ankara, Turkey;Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 19
Cited 0

The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Table size reduction for data value predictors by exploiting narrow width values

Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Very low power pipelines using significance compression

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing wire delay penalty through value prediction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Area/delay estimation for digital signal processor cores

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A Cost-Effective Clustered Architecture

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Speeding Up Processing with Approximation Circuits

Computer
Software-Controlled Operand-Gating

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fully distributed register files for heterogeneous clustered microarchitectures

Fully distributed register files for heterogeneous clustered microarchitectures
An asymmetric clustered processor based on value content

Proceedings of the 19th annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. Helper Cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit Helper Cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; Helper Cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on average.