Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Scheduling strategies for optimistic parallel execution of irregular programs
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Dynamic parallelization of single-threaded binary programs using speculative slicing
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Decoupled software pipelining creates parallelization opportunities
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Automatic parallelization of simulink applications
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
DAFT: decoupled acyclic fault tolerance
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Skewed pipelining for parallel simulink simulations
Proceedings of the Conference on Design, Automation and Test in Europe
Concurrent separation logic for pipelined parallelization
SAS'10 Proceedings of the 17th international conference on Static analysis
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Runtime parallelization of legacy code on a transactional memory system
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Paragon: collaborative speculative loop execution on GPU and CPU
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
The HELIX project: overview and directions
Proceedings of the 49th Annual Design Automation Conference
Speculative separation for privatization and reductions
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Yada: Straightforward parallel programming
Parallel Computing
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Pipelining for cyclic control systems
Proceedings of the 16th international conference on Hybrid systems: computation and control
A compiler-level intermediate representation based binary analysis and rewriting system
Proceedings of the 8th ACM European Conference on Computer Systems
Proceedings of the 6th International Systems and Storage Conference
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing
ACM Transactions on Architecture and Code Optimization (TACO)
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction. A recently proposed technique, Decoupled Software Pipelining (DSWP), has demonstrated promise by partitioning loops into long-running, fine-grained threads organized into a pipeline. Using a pipeline organization and execution decoupled by inter-core communication queues, DSWP offers increased execution efficiency that is largely independent of inter-core communication latency. This paper proposes adding speculation to DSWP and evaluates an automatic approach for its implementation. By speculating past infrequent dependences, the benefit of DSWP is increased by making it applicable to more loops, facilitating better balanced threads, and enabling parallelized loops to be run on more cores. Unlike prior speculative threading proposals, speculative DSWP focuses on breaking dependence recurrences. By speculatively breaking these recurrences, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads. Using an initial automatic compiler implementation and a validated processor model, this paper demonstrates significant gains using speculation for 4-core chip multiprocessor models running a variety of codes.