Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
Computer - Special issue on experimental research in computer architecture
Instruction level profiling and evaluation of the IBM/6000
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Transaction processing performance on PA-RISC commercial Unix systems
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Commercial workload performance in the IBM POWER2 RISC System/6000 processor
IBM Journal of Research and Development
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Active memory: a new abstraction for memory-system simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The structure and performance of interpreters
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Efficient instruction cache simulation and execution profiling with a threaded-code interpreter
Proceedings of the 29th conference on Winter simulation
Exploiting idle floating-point resources for integer execution
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Portable, continuous recording of complete computer behavior with low overhead (extended abstract)
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Predicting the performance of distributed virtual shared-memory applications
IBM Systems Journal
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies
IEEE Transactions on Computers
IEEE Transactions on Computers
Memory system behavior of Java programs: methodology and analysis
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analytical model of the working-set sizes in decision-support systems
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for copy and tamper resistant software
ACM SIGPLAN Notices
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Architectural support for copy and tamper resistant software
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Understanding the impact of X86/NT computing on microarchitecture
Workload characterization of emerging computer applications
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
System Optimization for OLTP Workloads
IEEE Micro
Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Boosting the Performance of Three-Tier Web Servers Deploying SMP Architecture
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Improving the Data Cache Performance of Multiprocessor Operating Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Modeling and Analysis of Task Migration in Shared-Memory Multiprocessor Computer Systems
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
The impact of extrinsic cache performance on predictability of real-time systems
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Real time aspects of cluster based caches
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Behavior and Performance of Interactive Multi-Player Game Servers
Cluster Computing
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
Reducing Server Data Traffic Using a Hierarchical Computation Model
IEEE Transactions on Parallel and Distributed Systems
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Application analysis using memory pressure
Proceedings of the 2005 workshop on Memory system performance
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
WCET analysis of instruction caches with prefetching
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
IEEE Transactions on Computers
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
Steps towards cache-resident transaction processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Breaking the memory wall in MonetDB
Communications of the ACM - Surviving the data deluge
Analyzing the worst-case execution time for instruction caches with prefetching
ACM Transactions on Embedded Computing Systems (TECS)
A performance methodology for commercial servers
IBM Journal of Research and Development
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.01 |
Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user commercial workloads. This paper presents research, using traces of industry-standard commercial benchmarks, which examines the characteristic differences between technical and commercial workloads and illustrates how those differences affect cache performance.Commercial and technical environments differ in their respective branch behavior, operating system activity, I/O, and dispatching characteristics. A wide range of uniprocessor instruction and data cache geometries were studied. The instruction cache results for commercial workloads demonstrate that instruction cache performance can no longer be neglected because these workloads have much larger code working sets than technical applications. For database workloads, a breakdown of kernel and user behavior reveals that the application component can exhibit behavior similar to the operating system and therefore, can experience miss rates equally high. This paper also indicates that “dispatching” or process switching characteristics must be considered when designing level-two caches. The data presented shows that increasing the associativity of second-level caches can reduce miss rates significantly. Overall, the results of this research should help system designers choose a cache configuration that will perform well in commercial markets.