Designing the TFP Microprocessor

Authors:
Peter Yan-Tek Hsu
Affiliations:
-
Venue:
IEEE Micro
Year:
1994

Citing 5
Cited 25

The ZS-1 central processor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Design of the IBM RISC System/6000 floating-point execution unit

IBM Journal of Research and Development

Improving resource utilization of the MIPS R8000 via post-scheduling global instruction distribution

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparison of two pipeline organizations

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Resource allocation in a high clock rate microprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiling and optimizing for decoupled architectures

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Exploring configurations of functional units in an out-of-order superscalar processor

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Improving single-process performance with multithreaded processors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Wrong-path instruction prefetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Microarchitecture support for improving the performance of load target prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Trident: a scalable architecture for scalar, vector, and matrix operations

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Superscalar Instruction Issue

IEEE Micro
Register File Energy Reduction by Operand Data Reuse

PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Store Buffer Design in First-Level Multibanked Data Caches

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing the performance of dynamically-linked programs

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Quantified Score

Hi-index	0.01

Visualization

Abstract

Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture. This floating-point, computation-oriented processor uses a superscalar machine organization that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units. Its split-level cache structure reduces cache misses by directing integer data references to a 16-Kbyte on-chip cache, while channeling floating-point data references off chip to a 4 Mbyte cache.