A variable instruction stream extension to the VLIW architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler Techniques for Concurrent Multithreading with Hardware Speculation Support
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Program Optimization for Concurrent Multithreaded Architectures
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Performance Study of a Concurrent Multithreaded Processor
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
MASCOTS '98 Proceedings of the 6th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Design experience of a chip multiprocessor merlot and expectation to functional verification
Proceedings of the 15th international symposium on System Synthesis
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Return-Address Prediction in Speculative Multithreaded Environments
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Compiler support for speculative multithreading architecture with probabilistic points-to analysis
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
iWatcher: Efficient Architectural Support for Software Debugging
Proceedings of the 31st annual international symposium on Computer architecture
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture
IEEE Transactions on Parallel and Distributed Systems
Multithreaded Extension to Multicluster VLIW Processors for Embedded Applications
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Efficient and flexible architectural support for dynamic monitoring
ACM Transactions on Architecture and Code Optimization (TACO)
Design Space Exploration of a Software Speculative Parallelization Scheme
IEEE Transactions on Parallel and Distributed Systems
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient
Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Energy-Efficient Thread-Level Speculation
IEEE Micro
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic parallelization and mapping of binary executables on hierarchical platforms
Proceedings of the 3rd conference on Computing frontiers
Bulk Disambiguation of Speculative Threads in Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 33rd annual international symposium on Computer Architecture
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
A thread partitioning technique for multithreaded execution along hot paths
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
A hot path based thread partitioning technique for thread pipelining model
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
Performance scalability of decoupled software pipelining
ACM Transactions on Architecture and Code Optimization (TACO)
A hybrid open queuing network model approach for multi-threaded dataflow architecture
Computer Communications
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Exploiting speculative thread-level parallelism in data compression applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Runtime parallelization of legacy code on a transactional memory system
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Loop selection for thread-level speculation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Cache management for discrete processor architectures
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
HiRe: using hint & release to improve synchronization of speculative threads
Proceedings of the 26th ACM international conference on Supercomputing
Dynamically dispatching speculative threads to improve sequential execution
ACM Transactions on Architecture and Code Optimization (TACO)
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative parallelization: eliminating the overhead of failure
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
An automatic thread decomposition approach for pipelined multithreading
International Journal of High Performance Computing and Networking
A hyperscalar dual-core architecture for embedded systems
Microprocessors & Microsystems
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 14.98 |
The common single-threaded execution model limits processors to exploiting only the relatively small amount of instruction-level parallelism available in application programs. The superthreaded processor, on the other hand, is a concurrent multithreaded architecture (CMA) that can exploit the multiple granularities of parallelism available in general-purpose application programs. Unlike other CMAs that rely primarily on hardware for run-time dependence detection and speculation, the superthreaded processor combines compiler-directed thread-level speculation of control and data dependences with run-time data dependence verification hardware. This hybrid of a superscalar processor and a multiprocessor-on-a-chip can utilize many of the existing compiler techniques used in traditional parallelizing compilers developed for multiprocessors. Additional unique compiler techniques, such as the conversion of data speculation into control speculation, are also introduced to generate the superthreaded code and to enhance the parallelism between threads. A detailed execution-driven simulator is used to evaluate the performance potential of this new architecture. It is found that a superthreaded processor can achieve good performance on complex application programs through this close coupling of compile-time and run-time information.