Ten lectures on wavelets
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A program data flow analysis procedure
Communications of the ACM
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
IEEE Transactions on Computers
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
International Journal of Parallel Programming
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Measurements of major locality phases in symbolic reference strings
SIGMETRICS '76 Proceedings of the 1976 ACM SIGMETRICS conference on Computer performance modeling measurement and evaluation
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Efficient remapping mechanisms for an adaptable memory system
Efficient remapping mechanisms for an adaptable memory system
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Wavelet Analysis for Microprocessor Design: Experiences with Wavelet-Based dI/dt Characterization
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Effective Adaptive Computing Environment Management via Dynamic Optimization
Proceedings of the international symposium on Code generation and optimization
Maintaining Consistency and Bounding Capacity of Software Code Caches
Proceedings of the international symposium on Code generation and optimization
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Gated memory control for memory monitoring, leak detection and garbage collection
Proceedings of the 2005 workshop on Memory system performance
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Profile-guided proactive garbage collection for locality optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Wavelet-based phase classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Feedback-directed memory disambiguation through store distance analysis
Proceedings of the 20th annual international conference on Supercomputing
Effective management of multiple configurable units using dynamic optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Automatic software interference detection in parallel applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic analysis of speedup of MPI applications
Proceedings of the 22nd annual international conference on Supercomputing
Sampling-based program locality approximation
Proceedings of the 7th international symposium on Memory management
Feature-level phase detection for execution trace using object cache
WODA '08 Proceedings of the 2008 international workshop on dynamic analysis: held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008)
Exploration of the Influence of Program Inputs on CMP Co-scheduling
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Modeling Relations between Inputs and Dynamic Behavior for General Programs
Languages and Compilers for Parallel Computing
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
Combining Edge Vector and Event Counter for Time-Dependent Power Behavior Characterization
Transactions on High-Performance Embedded Architectures and Compilers II
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Phase detection using trace compilation
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Virtual reuse distance analysis of SPECjvm2008 data locality
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Locality behavior of parallel and sequential algorithms for irregular graph problems
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Efficient program power behavior characterization
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Phase complexity surfaces: characterizing time-varying program behavior
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Lightweight, robust adaptivity for software transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Automatic Phase Detection and Structure Extraction of MPI Applications
International Journal of High Performance Computing Applications
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Program phase detection and exploitation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic program phase detection in distributed shared- memory multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Analysis of application-aware on-chip routing under traffic uncertainty
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Low cost working set size tracking
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
A study on the locality behavior of minimum spanning tree algorithms
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A transactional memory with automatic performance tuning
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Dynamic code region (DCR) based program phase tracking and prediction for dynamic optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Characterizing time-varying program behavior using phase complexity surfaces
Transactions on High-Performance Embedded Architectures and Compilers IV
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Path-Based reuse distance analysis
CC'06 Proceedings of the 15th international conference on Compiler Construction
Predictive power management for multi-core processors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Phase-based tuning for better utilization of performance-asymmetric multicore processors
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving dynamic prediction accuracy through multi-level phase analysis
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Extracting the optimal sampling frequency of applications using spectral analysis
Concurrency and Computation: Practice & Experience
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Multi-level phase analysis for sampling simulation
Proceedings of the Conference on Design, Automation and Test in Europe
Imbalanced cache partitioning for balanced data-parallel programs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-time prediction. By profiling a training input, it identifies locality phases by sifting through all accesses to all data elements using variable-distance sampling, wavelet filtering, and optimal phase partitioning. It then constructs a phase hierarchy through grammar compression. Finally, it inserts phase markers into the program using binary rewriting. When the instrumented program runs, it uses the first few executions of a phase to predict all its later executions.Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. The second half of the paper presents a comprehensive evaluation. It measures the accuracy and the coverage of the new technique and compares it with best known run-time methods. It measures its benefit in adaptive cache resizing and memory remapping. Finally, it compares the automatic analysis with manual phase marking. The results show that locality phase prediction is well suited for identifying large, recurring phases in complex programs.