The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Superthreaded Processor Architecture
IEEE Transactions on Computers
IEEE Transactions on Computers
ARM System-on-Chip Architecture
ARM System-on-Chip Architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Computer
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Slipstream Execution Mode for CMP-Based Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
IEEE Transactions on Computers
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Federation: repurposing scalar cores for out-of-order instruction issue
Proceedings of the 45th annual Design Automation Conference
Amdahl's Law in the Multicore Era
Computer
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
A Theoretical Framework for Value Prediction in Parallel Systems
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Hyperscalar: A Novel Dynamically Reconfigurable Multi-core Architecture
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Speculative Execution on GPU: An Exploratory Study
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Prefetching in Embedded Mobile Systems Can Be Energy-Efficient
IEEE Computer Architecture Letters
A Unitable Computing Architecture for Chip Multiprocessors
The Computer Journal
Hi-index | 0.00 |
This paper proposes a lightweight reconfigurable dual-core architecture for embedded systems, called hyperscalar dual-core architecture. The proposed architecture can play three different roles (a 2-issue statically scheduled superscalar processor, a homogeneous dual-core processor, or a standalone single-core processor), allowing embedded systems to accommodate diverse workloads. The proposed design uses four extended instructions to enable programmers to dynamically switch the roles of the proposed architecture. This paper also presents an instruction analyzer (IA) that connects two scalar in-order cores to handle role switching. The design of IA makes it possible for the two cores to work together like a 2-issue statically scheduled superscalar processor. Based on the proposed dispatching rules, the IA dispatches instructions with data dependencies to the same core. Since two cores act like a statically scheduled superscalar processor, they can resolve data dependencies using existing forwarding paths without introducing the high-area-cost inter-core operand-switching crossbars. Simulation results show that when the proposed architecture works in a statically scheduled superscalar manner, it achieves a 26% higher instructions per cycle (IPC) averaged across all 29 benchmarks from the MiBench suite than a scalar in-order core. The increases in area and power to extend a homogeneous dual-core processor to a hyperscalar dual-core processor are only 1.8% and 1.75%, respectively, using 90nm CMOS technology.