ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
IEEE Micro
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Microarchitectural exploration with Liberty
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Beating in-order stalls with "flea-flicker" two-pass pipelining
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The liberty structural specification language: a high-level modeling language for component reuse
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Efficiently Evaluating Speedup Using Sampled Processor Simulation
IEEE Computer Architecture Letters
A framework for unrestricted whole-program optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
Support for High-Frequency Streaming in CMPs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
An automatic thread decomposition approach for pipelined multithreading
International Journal of High Performance Computing and Networking
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed pipelined multithreading (PMT) techniques have been demonstrated to have general-purpose applicability and are also able to effectively tolerate inter-core latencies through pipelined interthread communication. These desirable properties make PMT techniques strong candidates for program parallelization on current and future multicore processors and understanding their performance characteristics is critical to their deployment. To that end, this paper evaluates the performance scalability of a general-purpose PMT technique called decoupled software pipelining (DSWP) and presents a thorough analysis of the communication bottlenecks that must be overcome for optimal DSWP scalability.