Scientific applications vs. SPEC-FP: a comparison of program behavior

Authors:
Kyle Rupnow;Arun Rodrigues;Keith Underwood;Katherine Compton
Affiliations:
Univ. of Wisconsin, Madison, WI and Sandia National Labs, Albuquerque, NM;Univ. of Notre Dame, Notre Dame, IN;Sandia National Labs, Albuquerque, NM;Univ. of Wisconsin, Madison, WI
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 22
Cited 7

Interlock collapsing ALU for increased instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Guest Editors' Introduction: Evaluating Servers with Commercial Workloads

Computer
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Cost-Effective Hardware Acceleration of Multimedia Applications

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Estimating Potential Parallelism for Platform Retargeting

WCRE '02 Proceedings of the Ninth Working Conference on Reverse Engineering (WCRE'02)
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
What to Adapt in a High-Performance Microprocessor

DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Characteristics of workloads used in high performance and technical computing

Proceedings of the 21st annual international conference on Supercomputing
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
Scientific Application Demands on a Reconfigurable Functional Unit Interface

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Improving Floating-Point Performance in Less Area: Fractured Floating Point Units (FFPUs)

Journal of Signal Processing Systems
On the Path to Exascale

International Journal of Distributed Systems and Technologies
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern scientific applications execute on massively parallel collections of microprocessors. Supercomputers such as the Cray XT3 (Red Storm) and Blue Gene/L support thousands to tens of thousands of processors per parallel job. However, individual microprocessor performance remains a critical component of overall performance. Traditional approaches to improve scientific application performance concentrate on floating-point (FP) instructions; however, our studies show that in the scientific applications used at Sandia National Labs, integer instructions constitute a large and critical part of the instruction mix. Although the SPEC-FP benchmark suite is considered representative of FP workloads, it has a much smaller proportion of integer computation instructions than the Sandia scientific applications, with 22.9% as compared to 36.9%. Integer instructions in Sandia applications also behave differently than in SPEC-FP. Integer instruction outputs are reused 8.8x to 13.1x more often in SPEC-FP benchmarks, and integer dataflow in Sandia applications is more complex than in the SPEC-FP suite. In this work, we examine common dataflow and usage patterns of integer instructions---information essential to develop hardware techniques to accelerate critical scientific applications. We present statistics for SPEC-FP and Sandia applications, summarizing integer computation usage and the size, shape and interface (number of inputs/outputs) of dataflow graphs.