Scalability and resource usage of an OLAP benchmark on clusters of PCs
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploring the Cache Design Space for Web Servers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Boosting the Performance of Three-Tier Web Servers Deploying SMP Architecture
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Energy management schemes for memory-resident database systems
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Journal of Parallel and Distributed Computing
DBmbench: fast and accurate database workload representation on modern microarchitecture
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
While database workloads consume a major fraction of the cycles in today's machines, there are only a few public-domain performance studies that characterize in detail how these workloads exercise the machines. This fact is due to the complexity of setting up and tuning database workloads, the high cost of the equipment required to evaluate them, and the frequent use of proprietary systems.In this paper, we help redress this problem by presenting a detailed performance characterization of the TPC-D benchmark running on a Quad Pentium Pro SMP multiprocessor with Windows NT and Microsoft's SQL Server. We use the Pentium Pro built-in hardware counters and a software tool that monitors system activity. Our results show that TPC-D queries have a relatively low CPI. The CPIs, which are 1.27 on average for the 17 read-only queries, are comparable to values observed for technical workloads. The major factors inhibiting lower CPIs are the instruction fetch bottleneck and data misses in the secondary cache. Kernel time is negligible: queries spend less than 6\% of their time on average in the kernel.Other results show that static branch prediction is effective in TPC-D, that the exclusive state in the cache tags is largely unnecessary, and that the use of indexing techniques is quite useful in saving I/O operations. Finally, we compare our results to the ones published for TPC-C.