The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism
IEEE Transactions on Computers
A General Compiler Framework for Speculative Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Detecting Conflicts of Interest
RE '06 Proceedings of the 14th IEEE International Requirements Engineering Conference
Support for High-Frequency Streaming in CMPs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Performance scalability of decoupled software pipelining
ACM Transactions on Architecture and Code Optimization (TACO)
Clustered Software Queue for Efficient Pipelined Multithreading
PDCAT '09 Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies
Hi-index | 0.00 |
Thread decomposition is critical for pipelined multithreading PMT to gain higher performance on target multi-core processors. This paper presents an automatic thread decomposition approach, which maps the decomposition problem onto a graph-theoretic framework to construct an optimised directed acyclic graph DAG with minimal bottleneck node size and balanced node size. In this approach, control dependence is treated as special data dependence and then an effective approach is proposed to remove redundant control dependences. A weighted DAG is constructed by assigning appropriate weights to all nodes and all dependences according to profile information. An automatic thread decomposition algorithm is given to generate an optimised pipeline based on the weighted DAG. The algorithm has been evaluated on a commodity multi-core processor, and experimental results show that it has achieved speedup ranging from 113% to 174% on some SPEC CPU 2000 benchmark programs.