Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Evaluation of a competitive-update cache coherence protocol with migratory data detection
Journal of Parallel and Distributed Computing
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Pentium 4 Performance-Monitoring Features
IEEE Micro
Automatic Verification of Parameterized Cache Coherence Protocols
CAV '00 Proceedings of the 12th International Conference on Computer Aided Verification
Interactive locality optimization on NUMA architectures
Proceedings of the 2003 ACM symposium on Software visualization
Optimizing Data Locality for SCI-Based PC-Clusters with the SMiLE Monitoring Approach
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
iWatcher: Efficient Architectural Support for Software Debugging
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic tracking of page miss ratio curve for memory management
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Portable and accurate sampling profiling for Java
Software—Practice & Experience - Research Articles
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Next-Generation Performance Counters: Towards Monitoring Over Thousand Concurrent Events
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Hi-index | 0.00 |
As we reach the limits of single-core computing, we are promised more and more cores in our systems. Modern architectures include many performance counters per core, but few or no inter-core counters. In fact, performance counters were not designed to be exploited by users, as they now are, but simply as aids for hardware debugging and testing during system creation. As such, they tend to be an "after thought" in the design, with no standardization across or within platforms. Nonetheless, given access to these counters, researchers are using them to great advantage [17]. Furthermore, evaluating counters for multicore systems has become a complex and resource consuming task. We propose a Performance Monitoring System consisting of a specialized CPU core designed to allow efficient collection and evaluation of performance data for both static and dynamic optimizations. Our system provides a transparent mechanism to change architectural features dynamically, inform the Operating System of process behaviors, and assist in profiling and debugging. For instance, a piece of hardware watching snoop packets can determine when a write-update cache coherence protocol would be helpful or detrimental to the currently running program. Our system is designed to allow the hardware to feed performance statistics back to software, allowing dynamic architectural adjustments at runtime.