Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities

Authors:
Taku Ohsawa;Masamichi Takagi;Shoji Kawahara;Satoshi Matsushita
Affiliations:
NEC Corporation;NEC Corporation;NEC Corporation;NEC Corporation
Venue:
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2005

Citing 12
Cited 8

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Stanford Hydra CMP

IEEE Micro
A note on greedy algorithms for the maximum weighted independent set problem

Discrete Applied Mathematics
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Embedded DSP Processor Design: Application Specific Instruction Set Processors

Embedded DSP Processor Design: Application Specific Instruction Set Processors
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
An Overview of Prophet

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Energy-efficient parallel software for mobile hand-held devices

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Dual-thread speculation: a simple approach to uncover thread-level parallelism on a simultaneous multithreaded processor

International Journal of Parallel Programming
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a speculative multi-threading processor architecture called Pinot. Pinot exploits parallelism over a wide range of granularities without modifying program sources. Since exploitation of fine-grain parallelism suffers from limits of parallelism and overhead incurred by parallelization, it is better to extract coarse-grain parallelism. Coarse-grain parallelism is biased in some programs (mainly, numerical ones) and some program portions. Therefore, exploiting both coarse- and fine-grain parallelism is a key to the performance of speculative multithreading. The features of Pinot are as follows: (1) A parallelizing tool extracts parallelism at any level of granularity (e.g. even ten thousand instructions) from any program sub-structures (e.g. loops, calls, or basic blocks). The tool utilizes formulation in which the parallelization process is reduced to a combinatorial optimization problem. (2) A parallel execution model with extension of thread control instructions is designed in order to minimize the increase of the dynamic instruction count. The model employs implicit thread termination and cancellation, as well as register value transfer without synchronization. (3) A versioning cache called Version Resolution Cache (VRC) accomplishes both coarse- and fine-grained speculative multithreading. VRC operates as a large buffer for coarsegrained multi-threading. In addition, it provides low latency inter-thread communication with an update-based protocol for fine-grained multi-threading. We performed cycle-accurate simulations with 38 programs from the SPEC and MiBench benchmarks. The speedup with 4-processor-element- Pinot is up to 3.7 times, and 2.2 times on geometric mean against a conventional processor. The speedup in a program (susan) drops from 3.7 to 1.6 when the speculative buffer size is limited to 256 bytes. It confirms that exploiting coarse-grain parallelism is essential to the improved performance. FPGA implementation shows 32% overhead of area and 12% increase of