Exploiting fine-grained parallelism through a combination of hardware and software techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Facilitating superscalar processing via a combined static/dynamic register renaming scheme
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A High Performance Two-Level Register File Organization
A High Performance Two-Level Register File Organization
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Dependency Chain Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Energy-efficient renaming with register versioning
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
The Design and Evaluation of a Selective Way Based Trace Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Function units sharing between neighbor cores in CMP
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
There is still much performance to be gained by out-of-order processors with wider issue widths. However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements. This paper introduces the braid, a compile-time identified entity that enables the execution core to scale to wider widths by exploiting the small fanout and short lifetime of values produced by the program. Braid processing requires identification by the compiler, minor extensions to the ISA, and support by the microarchitecture. The result from processing braids is performance within 9% of a very aggressive conventional out-of-order microarchitecture with almost the complexity of an in-order implementation.