PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Exploring the design space for a shared-cache multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Going Beyond Integer Programming with the Omega Test to Eliminate False Data Dependences
IEEE Transactions on Parallel and Distributed Systems
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Idiom recognition in the Polaris parallelizing compiler
ICS '95 Proceedings of the 9th international conference on Supercomputing
Experience with efficient array data flow analysis for array privatization
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Nonlinear and Symbolic Data Dependence Testing
IEEE Transactions on Parallel and Distributed Systems
The Superthreaded Processor Architecture
IEEE Transactions on Computers
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Parallel Programming with Polaris
Computer
Computer
The Future of Systems Research
Computer
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Using thread-level speculation to simplify manual parallelization
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward efficient and robust software speculative parallelization on multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Reducing misspeculation overhead for module-level speculative execution
Proceedings of the 2nd conference on Computing frontiers
Exposing speculative thread parallelism in SPEC2000
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems (TOPLAS)
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
The structure of a compiler for explicit and implicit parallelism
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Recent proposals for Chip Multiprocessors (CMPs) advocate speculative, or implicit, threading in which the hardware employs prediction to peel off instruction sequences (i.e., implicit threads) from the sequential execution stream and speculatively executes them in parallel on multiple processor cores. These proposals augment a conventional multiprocessor, which employs explicit threading, with the ability to handle implicit threads. Current proposals focus on only implicitly-threaded code sections. This paper identifies, for the first time, the issues in combining explicit and implicit threading. We present the Multiplex architecture to combine the two threading models. Multiplex exploits the similarities between implicit and explicit threading, and provides a unified support for the two threading models without additional hardware. Multiplex groups a subset of protocol states in an implicitly-threaded CMP to provide a write-invalidate protocol for explicit threads.Using a fully-integrated compiler infrastructure for automatic generation of Multiplex code, this paper presents a detailed performance analysis for entire benchmarks, instead of just implicitly-threaded sections, as done in previous papers. We show that neither threading models alone performs consistently better than the other across the benchmarks. A CMP with four dual-issue CPUs achieves a speedup of 1.48 and 2.17 over one dual-issue CPU, using implicit-only and explicit-only threading, respectively. Multiplex matches or outperforms the better of the two threading models for every benchmark, and a four-CPU Multiplex achieves a speedup of 2.63. Our detailed analysis indicates that the dominant overheads in an implicitly-threaded CMP are speculation state overflow due to limited L1 cache capacity, and load imbalance and data dependences in fine-grain threads.