Dynamic dependency analysis of ordinary programs

Authors:
Todd M. Austin;Gurindar S. Sohi
Affiliations:
-;-
Venue:
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Year:
1992

Citing 7
Cited 55

Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications

IEEE Transactions on Computers
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Assessing the benefits of fine-grain parallelism in dataflow programs

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture

The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism

ACM SIGARCH Computer Architecture News
Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRAIG: a practical framework for combining instruction scheduling and register assignment

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A limit study of local memory requirements using value reuse profiles

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
The intrinsic bandwidth requirements of ordinary programs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Measuring limits of parallelism and characterizing its vulnerability to resource constraints

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP

ICS '98 Proceedings of the 12th international conference on Supercomputing
Using complete system simulation to characterize SPECjvm98 benchmarks

Proceedings of the 14th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Java Runtime Systems: Characterization and Architectural Implications

IEEE Transactions on Computers
A time-stamping algorithm for efficient performance estimation of superscalar processors

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Sensitivity analysis of a superscalar processor model

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Quantifying behavioral differences between multimedia and general-purpose workloads

Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Fingerprinting: bounding soft-error detection latency and bandwidth

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance of Runtime Optimization on BLAST

Proceedings of the international symposium on Code generation and optimization
An Application Analysis Framework For Polymorphic Chip Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The impact of x86 instruction set architecture on superscalar processing

Journal of Systems Architecture: the EUROMICRO Journal
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Estimating critical region parallelism to guide platform retargeting

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Visual simulator for ILP dynamic OOO processor

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Quantifying ILP by means of graph theory

Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Visualizing potential parallelism in sequential programs

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Remote attestation on program execution

Proceedings of the 3rd ACM workshop on Scalable trusted computing
Development and specification of a reference model for agent-based systems

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallelism and data movement characterization of contemporary application classes

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parkour: parallel speedup estimates for serial programs

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Limits of parallelism using dynamic dependency graphs

WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Analysis of pure methods using garbage collection

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Integrating and querying source code of programs working on a database

KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
Querying external source code files of programs connecting to a relational database

Proceedings of the 5th Ph.D. workshop on Information and knowledge
Hazard driven test generation for SMT processors

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploiting fine-grain parallelism, first-order metrics of program execution, such as operation frequencies, are not sufficient; characterizing the exact nature of dependencies between operations is essential.This paper presents a methodology for constructing the dynamic execution graph that characterizes the execution of an ordinary program (an application program written in an imperatibve language such as C or FORTRAN) from a serial execution trace of the program. It then uses the methodology to study parallelism in the SPEC benchmarks. We see that the prallelism can be bursty in nature (periods of lots of parallelism followed by periods of little parallelism), but the average parallelism is quite high, ranging from 13 to 23,302 operations per cycle. Exposing this parallelism requires renaming of both registers and memory, though renaming registers alone exposes much of this parallelism. We also see that fairly large windows of dynamic instructions would be required to expose this parallelism from a sequential instruction stream.