Simulating a LAGS processor to consider variable latency on L1 D-Cache

Authors:
J. Manuel Colmenar;Oscar Garnica;Juan Lanchares;J. Ignacio Hidalgo
Affiliations:
I. T. en Informática de Sistemas, C.E.S. Felipe II, U. Complutense de Madrid (UCM), Aranjuez, Spain;U. Complutense de Madrid (UCM), Madrid, Spain;U. Complutense de Madrid (UCM), Madrid, Spain;U. Complutense de Madrid (UCM), Madrid, Spain
Venue:
Proceedings of the 2010 Summer Computer Simulation Conference
Year:
2010

Citing 22
Cited 0

Microarchitecture support for improving the performance of load target prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dynamic IPC/clock rate optimization

Proceedings of the 25th annual international symposium on Computer architecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Silent Stores and Store Value Locality

IEEE Transactions on Computers
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Theoretical Limits on the Data Dependent Performance of Asynchronous Circuits

ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Interfacing Synchronous and Asynchronous Modules Within a High-Speed Pipeline

ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Practical Design and Performance Evaluation of Completion Detection Circuits

ICCD '98 Proceedings of the International Conference on Computer Design
Circuit Implementation of a 600MHz Superscalar RISC Microprocessor

ICCD '98 Proceedings of the International Conference on Computer Design
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
High-performance ULSI: the real limiter to interconnect scaling

Proceedings of the 2005 international workshop on System level interconnect prediction
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Implications of Device Timing Variability on Full Chip Timing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Variable latency caches for nanoscale processor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing asynchronous variable latencies through probability distribution functions

Microprocessors & Microsystems
Dynamic capacity-speed tradeoffs in SMT processor caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Sim-async: an architectural simulator for asynchronous processor modeling using distribution functions

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Variability is one of the important issues in deep-submicron tecnologies, and the assumption of non-variable, constant latencies in the modules of deep-submicron processors can jeopardize their performance. Cache memories have demonstrated their data-dependent latency due to factors like the coupling capacitances or the distance between the port and the required data. In this paper we present, on one hand, a scheme to detect read operation completion on a variable latency cache memory. On the other hand, we present an asynchronous approach to improve processor performance using this feature. Hence, we propose a Locally-Asynchronous Globally-Synchronous (LAGS) superscalar microarchitecture in which read operations on a variable latency L1 data cache are managed through an asynchronous wrapper. In addition, we demonstrate its feasibility running SPEC2000 benchmarks on a 64-bit superscalar processor modeled through an architectural simulator. Simulations show speedups ranging up to 1.44 and averaging 1.22 over a non-variable cache design.