The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Large scale molecular dynamics simulations with fast multipole implementations
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Rapid Hardware Prototyping on RPM-2
IEEE Design & Test
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Shared cache architectures for decision support systems
Performance Evaluation
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C
Proceedings of the 19th annual international conference on Supercomputing
Design, implementation, and verification of active cache emulator (ACE)
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
A low-cost memory remapping scheme for address bus protection
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A performance methodology for commercial servers
IBM Journal of Research and Development
A low-cost memory remapping scheme for address bus protection
Journal of Parallel and Distributed Computing
Performance of large low-associativity caches
ACM SIGMETRICS Performance Evaluation Review
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Modern system design often requires multiple levels of simulation for design validation and performance debugging. However, while machines have gotten faster, and simulators have become more detailed, simulation speeds have not tracked machine speeds, As a result, it is difficult to simulate realistic problem sizes and hardware configurations for a target machine. Instead, researchers have focussed on developing sealing methodologies and running smaller problem sizes and configurations that attempt to represent the behavior of the real problem. Given the increasing size of problems today, it is unclear whether such an approach yields accurate results. Moreover, although commercial workloads are prevalent and important in today's marketplace, many simulation tools are unable to adequately profile such applications, let alone for realistic sizes.In this paper we present a hardware-based emulation tool that can be used to aid memory system designers. Our focus is on the memory system because the ever-widening gap between processor and memory speeds means that optimizing the memory subsystem is critical for performance. We present the design of the Memory Instrumentation and Emulation System (MemoriES). MemoriES is a programmable tool designed using FPGAs and SDRAMs. It plugs into an SMP bus to perform on-line emulation of several cache configurations, structures and protocols while the system is running real-life workloads in real-time, without any slowdown in application execution speed. We demonstrate its usefulness in several case studies, and find several important results. First, using traces to perform system evaluation can lead to incorrect results (off by 100% or more in some cases) if the trace size is not sufficiently large. Second. MemoriES is able to detect performance problems by profiling miss behavior over the entire course of a run, rather than relying on a small interval of time. Finally, we observe that previous studies of SPLASH2 applications using scaled application sizes can result in optimistic miss rates relative to real sizes on real machines, providing potentially misleading data when used for design evaluation.