Analysis of multilevel graph partitioning
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Cost-Effective Clustered Architecture
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Thread-Spawning Schemes for Speculative Multithreading
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Back-end assignment schemes for clustered multithreaded processors
Proceedings of the 18th annual international conference on Supercomputing
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 33rd annual international symposium on Computer Architecture
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Speculative Decoupled Software Pipelining
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
CoreSymphony: an efficient reconfigurable multi-core architecture
ACM SIGARCH Computer Architecture News
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
Hi-index | 0.01 |
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law. In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations. The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.