Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Assessing the benefits of fine-grain parallelism in dataflow programs
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism
ACM SIGARCH Computer Architecture News
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRAIG: a practical framework for combining instruction scheduling and register assignment
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Measuring limits of parallelism and characterizing its vulnerability to resource constraints
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Java Runtime Systems: Characterization and Architectural Implications
IEEE Transactions on Computers
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Sensitivity analysis of a superscalar processor model
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Realizing High IPC Using Time-Tagged Resource-Flow Computing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance of Runtime Optimization on BLAST
Proceedings of the international symposium on Code generation and optimization
An Application Analysis Framework For Polymorphic Chip Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The impact of x86 instruction set architecture on superscalar processing
Journal of Systems Architecture: the EUROMICRO Journal
Chip multi-processor scalability for single-threaded applications
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Estimating critical region parallelism to guide platform retargeting
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Visual simulator for ILP dynamic OOO processor
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Quantifying ILP by means of graph theory
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Visualizing potential parallelism in sequential programs
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Remote attestation on program execution
Proceedings of the 3rd ACM workshop on Scalable trusted computing
Development and specification of a reference model for agent-based systems
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallelism and data movement characterization of contemporary application classes
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parkour: parallel speedup estimates for serial programs
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Limits of parallelism using dynamic dependency graphs
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Analysis of pure methods using garbage collection
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Integrating and querying source code of programs working on a database
KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
Querying external source code files of programs connecting to a relational database
Proceedings of the 5th Ph.D. workshop on Information and knowledge
Hazard driven test generation for SMT processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploiting fine-grain parallelism, first-order metrics of program execution, such as operation frequencies, are not sufficient; characterizing the exact nature of dependencies between operations is essential.This paper presents a methodology for constructing the dynamic execution graph that characterizes the execution of an ordinary program (an application program written in an imperatibve language such as C or FORTRAN) from a serial execution trace of the program. It then uses the methodology to study parallelism in the SPEC benchmarks. We see that the prallelism can be bursty in nature (periods of lots of parallelism followed by periods of little parallelism), but the average parallelism is quite high, ranging from 13 to 23,302 operations per cycle. Exposing this parallelism requires renaming of both registers and memory, though renaming registers alone exposes much of this parallelism. We also see that fairly large windows of dynamic instructions would be required to expose this parallelism from a sequential instruction stream.