Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
PTRAN—the IBM parallel translation system
Parallel functional languages and compilers
Experiences using the ParaScope Editor: an interactive parallel programming tool
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
An integrated compilation and performance analysis environment for data parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Measuring the effectiveness of automatic parallelization in SUIF
ICS '98 Proceedings of the 12th international conference on Supercomputing
Advanced compiler design and implementation
Advanced compiler design and implementation
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
SUIF Explorer: an interactive and interprocedural parallelizer
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Automatic loop transformations and parallelization for Java
Proceedings of the 14th international conference on Supercomputing
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Polaris: Improving the Effectiveness of Parallelizing Compilers
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Targeting Dynamic Compilation for Embedded Environments
Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Exploiting Method-Level Parallelism in Single-Threaded Java Programs
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Containers on the Parallelization of General-Purpose Java Programs
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
A cost-driven compilation framework for speculative parallelization of sequential programs
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Object-Distribution Analysis for Program Decomposition and Re-Clustering
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
TAPE: a transactional application profiling environment
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
The potential of trace-level parallelism in Java programs
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Modeling optimistic concurrency using quantitative dependence analysis
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A study of potential parallelism among traces in Java programs
Science of Computer Programming
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system
Proceedings of the 8th ACM International Conference on Computing Frontiers
Loop selection for thread-level speculation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
HydraVM: extracting parallelism from legacy sequential code using STM
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CUBIT: compact bitmap profiling for dynamic data dependence analysis
Proceedings of the 2013 Research in Adaptive and Convergent Systems
A thread partitioning approach for speculative multithreading
The Journal of Supercomputing
Hi-index | 0.00 |
Thread-level speculation (TLS) allows sequential programs to be arbitrarily decomposed into threads that can be safely executed in parallel. A key challenge for TLS processors is choosing thread decompositions that speedup the program. Current techniques for identifying decompositions have practical limitations in real systems. Traditional parallelizing compilers do not work effectively on most integer programs, and software profiling slows down program execution too much for real-time analysis.Tracer for Extracting Speculative Threads (TEST) is hardware support that analyzes sequential program execution to estimate performance of possible thread decompositions. This hardware is used in a dynamic parallelization system that automatically transforms unmodified, sequential Java programs to run on TLS processors. In this system, the best thread decompositions found by TEST are dynamically recompiled to run speculatively. This paper describes the analysis performed by TEST and presents simulation results demonstrating its effectiveness on real programs. Estimates are also provided that show the tracer requires minimal hardware additions to our speculative chipmultiprocessor (