A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Spyder: a SURE (SUperscalar and REconfigurable) processor
The Journal of Supercomputing - Special issue on field programmable gate arrays
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory interfacing and instruction specification for reconfigurable processors
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
IEEE Transactions on Computers
PipeRench implementation of the instruction path coprocessor
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance characterization of a hardware mechanism for dynamic optimization
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Computer
The Garp Architecture and C Compiler
Computer
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
The NAPA Adaptive Processing Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
DISE: a programmable macro engine for customizing applications
Proceedings of the 30th annual international symposium on Computer architecture
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 6th ACM conference on Computing frontiers
Hi-index | 0.00 |
Scaling the number of cores in a multi-core processor constraintsthe resources available in each core, resulting in reduced percoreperformance. Alternatively, the number of cores have to be reducedin order to improve per-core performance. In this paper, we propose atechnique to improve the per-core performance in a many-core processorwithout reducing the number of cores. In particular, we integrate aReconfigurable Hardware Unit (RHU) in each core. The RHU executesthe frequently encountered instructions to increase the core's overall executionbandwidth, thus improving its performance. We also propose anovel integrated hardware/software methodology for efficient RHU reconfiguration.The RHU has low area overhead, and hence has minimalimpact on the scalability of the multi-core. Our experiments show thatthe proposed architecture improves the per-core performance by an averageof about 12% across a wide range of applications, while incurringa per-core area overhead of only about 5%.