ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
Improving resource utilization of the MIPS R8000 via post-scheduling global instruction distribution
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparison of two pipeline organizations
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiling and optimizing for decoupled architectures
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Exploring configurations of functional units in an out-of-order superscalar processor
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Improving single-process performance with multithreaded processors
ICS '96 Proceedings of the 10th international conference on Supercomputing
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
Microarchitecture support for improving the performance of load target prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
IEEE Micro
Register File Energy Reduction by Operand Data Reuse
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Store Buffer Design in First-Level Multibanked Data Caches
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing the performance of dynamically-linked programs
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Hi-index | 0.01 |
Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture. This floating-point, computation-oriented processor uses a superscalar machine organization that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units. Its split-level cache structure reduces cache misses by directing integer data references to a 16-Kbyte on-chip cache, while channeling floating-point data references off chip to a 4 Mbyte cache.