Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Inside the as/400
Profile-directed restructuring of operating system code
IBM Systems Journal
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Timestamp snooping: an approach for extending SMPs
ACM SIGPLAN Notices
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
IEEE Transactions on Computers
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Large scale Itanium® 2 processor OLTP workload characterization and optimization
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
A performance methodology for commercial servers
IBM Journal of Research and Development
Hi-index | 0.00 |
In 1995, the IBM[R]1 AS/400[R] changed the architecture of its microprocessors to 64 bit, PowerPC AS[tm]. This entailed not only developing new processors, but also a very large software project. As is typical in such a large software project, the emphasis of the first release was on solid design, good encapsulation, and enhanced functionality with the majority of the performance optimizations being deferred to the subsequent releases. In order to ensure that the system would perform optimally for large and complex OLTP workloads running on the second generation of hardware (which grew the largest SMP from a 4-way to a 12-way), a team of software and hardware experts from several areas was assembled to analyze the system and identify bottlenecks and areas for improvement. This paper will describe the tools and methodology used by this team to perform the analysis and identify the performance improvements. The result of this effort was achieving 25,149 tpmC on the TPC-C benchmark on the new 12-way system. When this system first became available in August 1997, this was the 4th highest tpmC and it had the highest tpmC of any single system (non-cluster) available.