An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Properties of the working-set model
Communications of the ACM
An Architectural Framework for Runtime Optimization
IEEE Transactions on Computers
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Picking Statistically Valid and Early Simulation Points
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Redeeming IPC as a Performance Metric for Multithreaded Programs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Method-level phase behavior in java workloads
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Transition Phase Classification and Prediction
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A co-phase matrix to guide simultaneous multithreading simulation
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Structures for phase classification
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
The Strong correlation Between Code Signatures and Performance
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Phase guided sampling for efficient parallel application simulation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Performance modeling of parallel applications for grid scheduling
Journal of Parallel and Distributed Computing
Multiprocessor performance estimation using hybrid simulation
Proceedings of the 45th annual Design Automation Conference
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Communication Based Proactive Link Power Management
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Communication based proactive link power management
Transactions on High-Performance Embedded Architectures and Compilers IV
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamically dispatching speculative threads to improve sequential execution
ACM Transactions on Architecture and Code Optimization (TACO)
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Imbalanced cache partitioning for balanced data-parallel programs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program's execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource requirements. In this paper, we examine applying these phase analysis algorithms and how to adapt them to parallel applications running on shared memory processors. Our approach relies on a separate representation of each thread's activity. We first focus on showing its ability to identify similar intervals of execution across threads for a single run. We then show that it is effective at identifying similar behavior of a program when the number of threads is varied between runs. This can be used by developers to examine how different phases scale across different number of threads. Finally, we examine using the phase analysis to pick simulation points to guide multithreaded simulation.