Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Static and dynamic evaluation of data dependence analysis
ICS '93 Proceedings of the 7th international conference on Supercomputing
The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
Partitioning parallel programs for macro-dataflow
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems
Journal of the ACM (JACM)
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A general compiler framework for speculative multithreading
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Computer
Using thread-level speculation to simplify manual parallelization
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
JavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Hardware support for thread-level speculation
Hardware support for thread-level speculation
The structure of a compiler for explicit and implicit parallelism
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Chip multi-processor scalability for single-threaded applications
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A lifetime optimal algorithm for speculative PRE
ACM Transactions on Architecture and Code Optimization (TACO)
Implicit parallelism with ordered transactions
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative thread decomposition through empirical optimization
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
The potential of trace-level parallelism in Java programs
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Quasi-static scheduling for safe futures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 4th workshop on Declarative aspects of multicore programming
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
A study of potential parallelism among traces in Java programs
Science of Computer Programming
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Dynamic parallelization of single-threaded binary programs using speculative slicing
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Can transactions enhance parallel programs?
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Exploiting speculative thread-level parallelism in data compression applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grain speculative parallelism
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Loop selection for thread-level speculation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Single thread program parallelism with dataflow abstracting thread
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Dynamically dispatching speculative threads to improve sequential execution
ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient execution of speculative threads and transactions with hardware transactional memory
Future Generation Computer Systems
A thread partitioning approach for speculative multithreading
The Journal of Supercomputing
Hi-index | 0.00 |
With billion-transistor chips on the horizon, single-chip multiprocessors (CMPs) are likely to become commodity components. Speculative CMPs use hardware to enforce dependence, allowing the compiler to improve performance by speculating on ambiguous dependences without absolute guarantees of independence. The compiler is responsible for decomposing a sequential program into speculatively parallel threads, while considering multiple performance overheads related to data dependence, load imbalance, and thread prediction. Although the decomposition problem lends itself to a min-cut-based approach, the overheads depend on the thread size, requiring the edge weights to be changed as the algorithm progresses. The changing weights make our approach different from graph-theoretic solutions to the general problem of task scheduling. One recent work uses a set of heuristics, each targeting a specific overhead in isolation, and gives precedence to thread prediction, without comparing the performance of the threads resulting from each heuristic. By contrast, our method uses a sequence of balanced min-cuts that give equal consideration to all the overheads, and adjusts the edge weights after every cut. This method achieves an (geometric) average speedup of 74% for floating-point programs and 23% for integer programs on a four-processor chip, improving on the 52% and 13% achieved by the previous heuristics.