ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
25 years of the international symposia on Computer architecture (selected papers)
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Understanding and improving operating system effects in control flow prediction
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Reconfigurable Address Collector and Flying Cache Simulator
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Proceedings of the 30th annual international symposium on Computer architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
SBAC-PAD '02 Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Design, implementation, and verification of active cache emulator (ACE)
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An FPGA-based Pentium® in a complete desktop system
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Framework for instruction-level tracing and analysis of program executions
Proceedings of the 2nd international conference on Virtual execution environments
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Thermal modeling and management of DRAM memory systems
Proceedings of the 34th annual international symposium on Computer architecture
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
PinOS: a programmable framework for whole-system dynamic instrumentation
Proceedings of the 3rd international conference on Virtual execution environments
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Virtual machine power metering and provisioning
Proceedings of the 1st ACM symposium on Cloud computing
An efficient simulation algorithm for cache of random replacement policy
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Wear rate leveling: lifetime enhancement of PRAM with endurance variation
Proceedings of the 48th Design Automation Conference
HaLock: hardware-assisted lock contention detection in multithreaded applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
CMD: classification-based memory deduplication through page access characteristics
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
Memory trace analysis is an important technology for architecture research, system software (i.e., OS, compiler) optimization, and application performance improvements. Many approaches have been used to track memory trace, such as simulation, binary instrumentation and hardware snooping. However, they usually have limitations of time, accuracy and capacity. In this paper we propose a platform independent memory trace monitoring system, which is able to track virtual memory reference trace of full systems (including OS, VMMs, libraries, and applications). The system adopts a DIMM-snooping mechanism that uses hardware boards plugged in DIMM slots to snoop. There are several advantages in this approach, such as fast, complete, undistorted, and portable. Three key techniques are proposed to address the system design challenges with this mechanism: (1) To keep up with memory speeds, the DDR protocol state machine is simplified, and large FIFOs are added between the state machine and the trace transmitting logic to handle burst memory accesses; (2) To reconstruct physical-tovirtual mapping and distinguish one process' address space from others, an OS kernel module, which collects page table information, and a synchronization mechanism, which synchronizes the page table information with the memory race, are developed; (3) To dump massive trace data, we employ a straightforward method to compress the trace and use Gigabit Ethernet and RAID to send and receive the compressed trace. We present our implementation of an initial monitoring system, named HMTT (Hyper Memory Trace Tracker). Using HMTT, we have observed that burst bandwidth utilization is much larger than average bandwidth utilization, by up to 5X in desktop applications. We have also confirmed that the stream memory accesses of many applications contribute even more than 40% of L2 Cache misses and OS virtual memory management may decrease stream accesses in view of memory controller (or L2 Cache), by up to 30.2%. Moreover, we have evaluated OS impact on memory performance in real systems. The evaluations and case studies show the feasibility and effectiveness of our proposed monitoring mechanism and techniques.