Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture
IEEE Transactions on Computers
An architecture for mostly functional languages
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
IEEE Transactions on Parallel and Distributed Systems
Handling of packet dependencies: a critical issue for highly parallel network processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Using thread-level speculation to simplify manual parallelization
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward efficient and robust software speculative parallelization on multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
A cost-driven compilation framework for speculative parallelization of sequential programs
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
iWatcher: Efficient Architectural Support for Software Debugging
Proceedings of the 31st annual international symposium on Computer architecture
Interprocedural Probabilistic Pointer Analysis
IEEE Transactions on Parallel and Distributed Systems
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Scaling Up the Atlas Chip-Multiprocessor
IEEE Transactions on Computers
Efficient and flexible architectural support for dynamic monitoring
ACM Transactions on Architecture and Code Optimization (TACO)
Partially ordered epochs for thread-level speculation
Proceedings of the 2nd conference on Computing frontiers
Exposing speculative thread parallelism in SPEC2000
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Design Space Exploration of a Software Speculative Parallelization Scheme
IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Optimistic intra-transaction parallelism on chip multiprocessors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient
Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
IEEE Transactions on Parallel and Distributed Systems
Chip multi-processor scalability for single-threaded applications
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Energy-Efficient Thread-Level Speculation
IEEE Micro
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting distributed version concurrency in a transactional memory cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural Semantics for Practical Transactional Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Proceedings of the 33rd annual international symposium on Computer Architecture
Bulk Disambiguation of Speculative Threads in Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems (TOPLAS)
Speculative execution in a distributed file system
ACM Transactions on Computer Systems (TOCS)
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Implicit parallelism with ordered transactions
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative thread decomposition through empirical optimization
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware support for software controlled multithreading
ACM SIGARCH Computer Architecture News
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The potential of trace-level parallelism in Java programs
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Journal of Parallel and Distributed Computing
Quasi-static scheduling for safe futures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Is the optimism in optimistic concurrency warranted?
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
Software thread-level speculation: an optimistic library implementation
Proceedings of the 1st international workshop on Multicore software engineering
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A Non-blocking Multithreaded Architecture with Support for Speculative Threads
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Proceedings of the 4th workshop on Declarative aspects of multicore programming
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)
Languages and Compilers for Parallel Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
Languages and Compilers for Parallel Computing
A study of potential parallelism among traces in Java programs
Science of Computer Programming
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Tolerating latency in replicated state machines through client speculation
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Optimistic parallelism requires abstractions
Communications of the ACM - The Status of the P versus NP Problem
A lightweight in-place implementation for software thread-level speculation
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Speculation for Parallelizing Runtime Checks
SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Exploiting speculative thread-level parallelism in data compression applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Cloud-TM: harnessing the cloud with distributed transactional memories
ACM SIGOPS Operating Systems Review
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Timetraveler: exploiting acyclic races for optimizing memory race recording
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic processors demand dynamic operating systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Relaxed Concurrency Control in Transactional Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A pattern language for parallelizing irregular algorithms
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Supporting speculative multithreading on simultaneous multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Programmer-assisted automatic parallelization
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Limits of parallelism using dynamic dependency graphs
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Internally deterministic parallel algorithms can be fast
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Exploiting thread-level speculative parallelism with software value prediction
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Runtime automatic speculative parallelization
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Automatic speculative DOALL for clusters
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Compiler support for fine-grain software-only checkpointing
CC'12 Proceedings of the 21st international conference on Compiler Construction
HiRe: using hint & release to improve synchronization of speculative threads
Proceedings of the 26th ACM international conference on Supercomputing
Dynamically dispatching speculative threads to improve sequential execution
ACM Transactions on Architecture and Code Optimization (TACO)
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
An online profile guided optimization approach for speculative parallel threading
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Unifying thread-level speculation and transactional memory
Proceedings of the 13th International Middleware Conference
Automatic speculative parallelization of loops using polyhedral dependence analysis
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
ASC: automatically scalable computation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Efficient execution of speculative threads and transactions with hardware transactional memory
Future Generation Computer Systems
Hi-index | 0.00 |
While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this paper, we propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on both single-chip multiprocessors and on larger-scale machines where communication latencies are twenty times larger.