Data speculation support for a chip multiprocessor

Authors:
Lance Hammond;Mark Willey;Kunle Olukotun
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 11
Cited 130

Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient context-sensitive pointer analysis for C programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture

Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture

Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
JMTP: an architecture for exploiting concurrency in embedded Java applications with real-time considerations

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Speculative Versioning Cache

IEEE Transactions on Parallel and Distributed Systems
Handling of packet dependencies: a critical issue for highly parallel network processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Containers on the Parallelization of General-Purpose Java Programs

International Journal of Parallel Programming
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

International Journal of Parallel Programming
The Stanford Hydra CMP

IEEE Micro
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Compiling for Speculative Architectures

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Limits of Task-Based Parallelism in Irregular Applications

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
The Case for Speculative Multithreading on SMT Processors

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Improving Conditional Branch Prediction on Speculative Multithreading Architectures

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Microprocessors - 10 Years Back, 10 Years Ahead

Informatics - 10 Years Back. 10 Years Ahead.
Fine-grain design space exploration for a cartographic SoC multiprocessor

ACM SIGARCH Computer Architecture News
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A fast approximate interprocedural analysis for speculative multithreading compilers

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Compiler Techniques for Energy Saving in Instruction Caches of Speculative Parallel Microarchitectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Implicitly-multithreaded processors

Proceedings of the 30th annual international symposium on Computer architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism

IEEE Transactions on Computers
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Balancing clustering-induced stalls to improve performance in clustered processors

Proceedings of the 2nd conference on Computing frontiers
Reducing misspeculation overhead for module-level speculative execution

Proceedings of the 2nd conference on Computing frontiers
Partially ordered epochs for thread-level speculation

Proceedings of the 2nd conference on Computing frontiers
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient

Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative execution in a distributed file system

Proceedings of the twentieth ACM symposium on Operating systems principles
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Energy-Efficient Thread-Level Speculation

IEEE Micro
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid transactional memory

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs

Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting reference idempotency to reduce speculative storage overflow

ACM Transactions on Programming Languages and Systems (TOPLAS)
Unbounded page-based transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A probabilistic pointer analysis for speculative optimizations

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Speculative execution in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Implicit parallelism with ordered transactions

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
A compiler cost model for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO)
Microprocessors in the era of terascale integration

Proceedings of the conference on Design, automation and test in Europe
Data locality enhancement for CMPs

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
The potential for variable-granularity access tracking for optimistic parallelism

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
Software thread-level speculation: an optimistic library implementation

Proceedings of the 1st international workshop on Multicore software engineering
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Skewed redundancy

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Speculative N-Way barriers

Proceedings of the 4th workshop on Declarative aspects of multicore programming
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)

Languages and Compilers for Parallel Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Combining thread level speculation helper threads and runahead execution

Proceedings of the 23rd international conference on Supercomputing
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
ECMon: exposing cache events for monitoring

Proceedings of the 36th annual international symposium on Computer architecture
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Tolerating latency in replicated state machines through client speculation

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
A lightweight in-place implementation for software thread-level speculation

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The use of hardware transactional memory for the trace-based parallelization of recursive Java programs

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
COMPASS: a programmable data prefetcher using idle GPU shaders

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative parallelization of sequential loops on multicores

International Journal of Parallel Programming
Can transactions enhance parallel programs?

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Cloud-TM: harnessing the cloud with distributed transactional memories

ACM SIGOPS Operating Systems Review
Balancing thread partition for efficiently exploiting speculative thread-level parallelism

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improving cache locality for thread-level speculation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On the effectiveness of speculative and selective memory fences

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Understanding bloom filter intersection for lazy address-set disambiguation

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Dual-thread speculation: a simple approach to uncover thread-level parallelism on a simultaneous multithreaded processor

International Journal of Parallel Programming
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Information effects

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A case of SCMP with TLS

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Single thread program parallelism with dataflow abstracting thread

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Criticality guided energy aware speculation for speculative multithreaded processors

Parallel Computing
Compiler support for fine-grain software-only checkpointing

CC'12 Proceedings of the 21st international conference on Compiler Construction
HiRe: using hint & release to improve synchronization of speculative threads

Proceedings of the 26th ACM international conference on Supercomputing
ACE: exploiting correlation for energy-efficient and continuous context sensing

Proceedings of the 10th international conference on Mobile systems, applications, and services
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
Mixed speculative multithreaded execution models

ACM Transactions on Architecture and Code Optimization (TACO)
Multiverse: efficiently supporting distributed high-level speculation

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Aggressive Value Prediction on a GPU

International Journal of Parallel Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

Thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. This paper describes the complete implementation of the support for threadlevel speculation on the Hydra chip multiprocessor (CMP). The support consists of a number of software speculation control handlers and modifications to the shared secondary cache memory system of the CMP This support is evaluated using five representative integer applications. Our results show that the speculative support is only able to improve performance when there is a substantial amount of medium--grained loop-level parallelism in the application. When the granularity of parallelism is too small or there is little inherent parallelism in the application, the overhead of the software handlers overwhelms any potential performance benefits from speculative-thread parallelism. Overall, thread-level speculation still appears to be a promising approach for expanding the class of applications that can be automatically parallelized, but more hardware intensive implementations for managing speculation control are required to achieve performance improvements on a wide class of integer applications.