The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Parallel Programming with Polaris
Computer
Computer
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
An energy saving strategy based on adaptive loop parallelization
Proceedings of the 39th annual Design Automation Conference
Proceedings of the 39th annual Design Automation Conference
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A general compiler framework for speculative multithreading
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 15th international symposium on System Synthesis
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors
International Journal of Parallel Programming
Exploiting Data Value Prediction in Compiler Based Thread Formation
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Improving Conditional Branch Prediction on Speculative Multithreading Architectures
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler support for speculative multithreading architecture with probabilistic points-to analysis
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Tuning data replication for improving behavior of MPSoC applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
A General Compiler Framework for Speculative Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Dynamic on-chip memory management for chip multiprocessors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Programming with transactional coherence and consistency (TCC)
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Using data compression in an MPSoC architecture for improving performance
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Dynamic run-time architecture techniques for enabling continuous optimization
Proceedings of the 2nd conference on Computing frontiers
Reducing misspeculation overhead for module-level speculative execution
Proceedings of the 2nd conference on Computing frontiers
Optimal task scheduling algorithm for cyclic synchronous tasks in general multiprocessor networks
Journal of Parallel and Distributed Computing
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient
Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Queue - Multiprocessors
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Energy-Efficient Thread-Level Speculation
IEEE Micro
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Customized on-chip memories for embedded chip multiprocessors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Selective code/data migration for reducing communication energy in embedded MpSoC architectures
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Speculative pre-execution assisted by compiler (SPEAR)
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
A probabilistic pointer analysis for speculative optimizations
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Executing Java programs with transactional memory
Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Hardware support for software controlled multithreading
ACM SIGARCH Computer Architecture News
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The potential for variable-granularity access tracking for optimistic parallelism
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
A dynamically reconfigurable cache for multithreaded processors
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Scheduling strategies for optimistic parallel execution of irregular programs
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
A hybrid open queuing network model approach for multi-threaded dataflow architecture
Computer Communications
On the Scalability of an Automatically Parallelized Irregular Application
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Optimistic concurrency for clusters via speculative locking
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 36th annual international symposium on Computer architecture
SPARTAN: A software tool for Parallelization Bottleneck Analysis
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Proceedings of the 7th ACM international conference on Computing frontiers
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Parallel inclusion-based points-to analysis
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Improving cache locality for thread-level speculation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Runtime parallelization of legacy code on a transactional memory system
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Circuit design of a dual-versioning L1 data cache for optimistic concurrency
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Understanding bloom filter intersection for lazy address-set disambiguation
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
International Journal of Parallel Programming
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Supporting speculative multithreading on simultaneous multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Acceleration techniques for chip-multiprocessor simulator debug
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A low-complexity issue queue design with speculative pre-execution
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Circuit design of a dual-versioning L1 data cache
Integration, the VLSI Journal
HiRe: using hint & release to improve synchronization of speculative threads
Proceedings of the 26th ACM international conference on Supercomputing
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An automatic thread decomposition approach for pipelined multithreading
International Journal of High Performance Computing and Networking
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 14.98 |
Much emphasis is now placed on chip-multiprocessor (CMP) architectures for exploiting thread-level parallelism in an application. In such architectures, speculation may be employed to execute applications that cannot be parallelized statically. In this paper, we present an efficient CMP architecture for speculative execution of sequential binaries without source recompilation. We present the software support that enables identification of threads from a sequential binary. The hardware includes a memory disambiguation mechanism that enables the detection of interthread memory dependence violations during speculative execution. This hardware is different from past proposals in that it does not rely on a snoopy-based cache-coherence protocol. Instead, it uses an approach similar to a directory-based scheme. Furthermore, the architecture includes a simple and efficient hardware mechanism to enable register-level communication between on-chip processors. Evaluation of this software-hardware approach shows that it is quite effective in achieving high performance when running sequential binaries.