Boosting single-thread performance in multi-core systems through fine-grain multi-threading

  • Authors:
  • Carlos Madriles;Pedro López;Josep M. Codina;Enric Gibert;Fernando Latorre;Alejandro Martinez;Raúl Martinez;Antonio Gonzalez

  • Affiliations:
  • Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain

  • Venue:
  • Proceedings of the 36th annual international symposium on Computer architecture
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law. In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations. The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.