Toward kilo-instruction processors

Authors:
Adrián Cristal;Oliverio J. Santana;Mateo Valero;José F. Martínez
Affiliations:
Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain;Cornell University, Ithaca, NY
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2004

Citing 33
Cited 26

Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scalable Hardware Memory Disambiguation for High ILP Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Design Complexity of the Load/Store Queue

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A case for resource-conscious out-of-order processors

IEEE Computer Architecture Letters
Checkpoint Processing and Recovery: An Efficient, Scalable Alternative to Reorder Buffers

IEEE Micro
Runahead Execution: An Effective Alternative to Large Instruction Windows

IEEE Micro

Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Kilo-instruction processors, runahead and prefetching

Proceedings of the 3rd conference on Computing frontiers
A PAB-based multi-prefetcher mechanism

International Journal of Parallel Programming
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting execution locality with a decoupled Kilo-instruction processor

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

ACM Transactions on Architecture and Code Optimization (TACO)
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Virtual register renaming

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Tuning the continual flow pipeline architecture

Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The continuously increasing gap between processor and memory speeds is a serious limitation to the performance achievable by future microprocessors. Currently, processors tolerate long-latency memory operations largely by maintaining a high number of in-flight instructions. In the future, this may require supporting many hundreds, or even thousands, of in-flight instructions. Unfortunately, the traditional approach of scaling up critical processor structures to provide such support is impractical at these levels, due to area, power, and cycle time constraints.In this paper we show that, in order to overcome this resource-scalability problem, the way in which critical processor resources are managed must be changed. Instead of simply upsizing the processor structures, we propose a smarter use of the available resources, supported by a selective checkpointing mechanism. This mechanism allows instructions to commit out of order, and makes a reorder buffer unnecessary. We present a set of techniques such as multilevel instruction queues, late allocation and early release of registers, and early release of load/store queue entries. All together, these techniques constitute what we call a kilo-instruction processor, an architecture that can support thousands of in-flight instructions, and thus may achieve high performance even in the presence of large memory access latencies.