Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
Address tracing of parallel systems via TRAPEDS
Microprocessors & Microsystems
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation
IEEE Transactions on Parallel and Distributed Systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
The Cache InjectionKofetch Architecture: Initial Performance Evaluation
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Efficient memory simulation in SimICS
SS '95 Proceedings of the 28th Annual Simulation Symposium
Simulation analysis of data-sharing in shared memory multiprocessors
Simulation analysis of data-sharing in shared memory multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance-steered design of software architectures for embedded multicore systems
Software—Practice & Experience
Boosting the Performance of Three-Tier Web Servers Deploying SMP Architecture
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Fine-grain design space exploration for a cartographic SoC multiprocessor
ACM SIGARCH Computer Architecture News
Journal of Parallel and Distributed Computing
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Simulating the future kilo-x86-64 core processors and their infrastructure
Proceedings of the 45th Annual Simulation Symposium
Hi-index | 0.00 |
A major concern with high-performance general-purpose work-stations is to speed up the execution of commands, uniprocess applications, and multiprocess applications with coarse- to medium-grain parallelism. To that end, a simple extension of a uniprocessor machine such as a shared-bus, shared-memory architecture can be employed. Both kinds of machines generally use the same OS model, and the same application can execute on these machines without recoding. However, an intrinsic limitation of the shared-bus architecture is the low number of processors that can be connected to the shared bus. When this number exceeds a critical value, the system's global performance drops drastically because of bus saturation. When two or more processors store a copy of the same memory block in their respective caches and one of them performs a write operation on a location in that block, a set of bus actions is necessary to guarantee that every subsequent read operation by any processor can get the up-to-date value of the modified location. Typically, researchers use simulation to investigate how to improve the performance of such machines. In particular, trace-driven simulation offers a good trade-off between speed, accuracy, and flexibility. A key point of this methodology is to find traces that both represent typical operating conditions and include all information potentially needed for an accurate simulation of the system. The authors have developed a methodology and a set of tools (called Trace Factory) to generate traces for the performance evaluation of shared-bus, shared-memory multiprocessor systems. Trace Factory is particularly useful for evaluating a multi-processor architecture's performance related to different work-loads and to most of the influencing activities of the operating system. The designer can evaluate and tune architectural solutions for coherence protocol, cache structure, bus, and memory.