The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Tuning the Pentium Pro Microarchitecture
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Control Flow Speculation in Multiscalar Processors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Concurrent Multithreaded Processor
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Thread-Spawning Schemes for Speculative Multithreading
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Minimal Dual-Core Speculative Multi-Threading Architecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Out-of-Order Commit Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Measuring the Parallelism Available for Very Long Instruction Word Architectures
IEEE Transactions on Computers
The Inhibition of Potential Parallelism by Conditional Jumps
IEEE Transactions on Computers
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Analysis of execution efficiency in the microthreaded processor UTLEON3
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
High performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and too power hungry for the many-core processors era. We propose a new architecture that uses multiple latency-tolerant in-order cores to improve single-thread performance, without requiring complex out-of-order execution hardware or large, power hungry register files and instruction buffers. Using simple cores to provide improved single-thread performance for conventional difficult-to-parallelize applications allows designers to place many more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the many-core parallel hardware without sacrificing performance of inherently serial applications. Our architecture splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple latency-tolerant in-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE is a novel implementation of Speculative Multithreading (SpMT). It uses latency tolerance to overcome performance issues of SpMT caused by load imbalance and inter-thread data communication delays. Using control independence prediction hardware to spawn threads, we simulate the potential performance of DOE on a subset of Spec2000 integer benchmarks under various parallelism scenarios and for DOE configurations of 2, 4, 6 and 8 single-issue latency tolerant cores.