Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
A Performance Study on Bounteous Transfer in Multiprocessor Sectored Caches
The Journal of Supercomputing - Special issue: high performance computing systems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Laziness pays! using lazy synchronization mechanisms to improve non-blocking constructions
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Hardware prediction for data coherency of scientific codes on DSM
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A Simple Local-Spin Group Mutual Exclusion Algorithm
IEEE Transactions on Parallel and Distributed Systems
Using the Alfa-1 simulated processor for educational purposes
Journal on Educational Resources in Computing (JERIC)
Experiences in modeling and simulation of computer architectures in DEVS
Transactions of the Society for Computer Simulation International - Recent advances in DEVS methodology--part II
Peppermint and Sled: Tools for Evaluating SMP Systems Based on IA-64 (IPF) Processors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Simulation Tool for Evaluating Shared Memory Systems
ANSS '03 Proceedings of the 36th annual symposium on Simulation
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Cache Simulation Based on Runtime Instrumentation for OpenMP Applications
ANSS '04 Proceedings of the 37th annual symposium on Simulation
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
Automatic Synthesis of High-Speed Processor Simulators
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Simulation as a tool for optimizing memory accesses on NUMA machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
Proceedings of the 19th annual international conference on Supercomputing
DRACO: optimized CC-NUMA system with novel dual-link interconnections to reduce the memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors
Future Generation Computer Systems
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques
ACM Transactions on Architecture and Code Optimization (TACO)
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities
valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Proceedings of the 2007 Summer Computer Simulation Conference
Guided Prefetching Based on Runtime Access Patterns
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Comprehensive cache performance tuning with a toolset
Future Generation Computer Systems
NUDA: a non-uniform debugging architecture and non-intrusive race detection for many-core
Proceedings of the 46th Annual Design Automation Conference
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
Proceedings of the 46th Annual Design Automation Conference
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors
Future Generation Computer Systems
RunAssert: a non-intrusive run-time assertion for parallel programs debugging
Proceedings of the Conference on Design, Automation and Test in Europe
Hierarchical circuit-switched NoC for multicore video processing
Microprocessors & Microsystems
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
CacheIn: a toolset for comprehensive cache inspection
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Analysis of the spatial and temporal locality in data accesses
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
Most publicly available simulation tools only simulate RISC architectures. These tools cannot capture the instruction mix and memory reference patterns of CISC architectures. We present an overview of Augmint, an execution driven multiprocessor simulation toolkit that fills this gap by supporting Intel x86 architectures. Augmint also supports trace driven simulation for uniprocessors as well as multiprocessors, with minor effort on the part of simulator developers. Augmint runs m4 macro extended C and C++ applications such as those in the SPLASH and SPLASH-2 benchmark suites. Augmint supports a thread based programming model with shared global address space and private stack space. Augmint supports a simulator interface compatible with that of the MINT simulation toolkit for MIPS architectures, thus allowing the reuse of most architecture simulators written for MINT. Augmint simulations run on x8d based uniprocessor systems under Unix or Windows NT. The source code of Augmint is publicly available from http://www.csrd.uiuc.edu/iacoma/augmint.