Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Dynamic zero compression for cache energy reduction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor
IEEE Micro
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Software-Controlled Operand-Gating
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative software management of datapath-width for energy optimization
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Reducing parity generation latency through input value aware circuits
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Multiplication acceleration through twin precision
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Value compression for efficient computation
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Bit-sliced datapath for energy-efficient high performance microprocessors
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 0.00 |
In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths.To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing already existing datapaths.Compared to a traditional four-wide superscalar processor, our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.