The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

Authors:
Vijay S. Pai;Parthasarathy Ranganathan;Sarita V. Adve
Affiliations:
-;-;-
Venue:
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Year:
1997

Citing 0
Cited 30

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The interaction of software prefetching with ILP processors in shared-memory systems

Proceedings of the 24th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors

Proceedings of the 25th annual international symposium on Computer architecture
Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Fast out-of-order processor simulation using memoization

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

IEEE Transactions on Computers
Scal-Tool: pinpointing and quantifying scalability bottlenecks in DSM multiprocessors

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
FLASH vs. (simulated) FLASH: closing the simulation loop

ACM SIGPLAN Notices
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors

Computer
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems

IEEE Transactions on Software Engineering
On the Exploitation of Value Predication and Producer Identification to Reduce Barrier Synchronization Time

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Improving the Performance of Heterogeneous DSMs via Multithreading

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Interaction Cost: For When Event Counts Just Don't Add Up

IEEE Micro
Application Representations for Multiparadigm Performance Modeling of Large-Scale Parallel Scientific Codes

International Journal of High Performance Computing Applications
High-level power analysis for multi-core chips

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impulse: Memory system support for scientific applications

Scientific Programming
An analysis of the effects of miss clustering on the cost of a cache miss

Proceedings of the 4th international conference on Computing frontiers
A network-computing infrastructure for tool experimentation applied to computer architecture education

WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
Statistical sampling of microarchitecture simulation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Mesoscale performance simulation of multicore processor systems

Software and Systems Modeling (SoSyM)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Scalable Flat Cache Only Memory Architectures (Flat COMA) are designed for reduced memory access latencies while minimizing programmer and operating system involvement. Indeed, to keep memory access latencies low, neither the programmer needs to perform ...