A view of database system performance measures

Authors:
Jim Gray
Affiliations:
Tandem Computers, Cupertino, California
Venue:
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Year:
1987

Citing 2
Cited 2

A measure of transaction processing power

Datamation
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases

Analysis of transaction management performance

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Automating computer bottleneck detection with belief nets

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database systems allow quick creation of performance problems. The goal of database systems is to allow the computer-illiterate to write complex and complete applications. It is the job of the system to translate a high-level description of data and procedures into efficient algorithms. The REAL performance metric of a system is how successfully it meets these goals.Practitioners use a much narrower definition of system performance. They assume a standard workload and measure performance by peak throughput and by dollar cost per transaction.Although many vendors have “private” performance measures, Bitton, Dewitt, and Turbyfill were the first to publish a measure of database system performance [Bitton]. Their measure, here called the Wisconsin benchmark, consists of a database design, a set of 32 retrieval and update statements, and a script for multi-user tests. They give two performance metrics: the elapsed time for each statement and the throughput of the system when running sixteen simultaneous scripts. No response time requirement or cost measure is included in the definition. The Wisconsin benchmark is the most widely used database benchmark.Largely in response to the Wisconsin benchmark, an informal group including Bitton and Dewitt, defined a benchmark more representative of transaction processing applications [Anon]. Its workload is:SCAN - A mini-batch operation to sequentially copy 1000 recordsSORT - A batch operation to sort one million records.DebitCredit - A short transaction with terminal input and output via X.25, presentation services, and a mix of five database accesses.The DebitCredit transaction has rules for scaling the terminal network and database size as the transaction rate increases, and also rules for distributing transactions if the system is decentralized.The performance metrics for this benchmark are:Elapsed time for the SCAN and SORT.Peak throughput for the DebitCredit transaction at 1 second response time for 95% of the transactions. This gives a TPS (Transactions Per Second) rating.Price per transaction where price is the 5-year cost of hardware, software and maintenance. This is sometimes called the vendors-view of price.This benchmark has been adopted by several vendors to compare their performance and price performance from release to release and also to compare their performance to competitive products. MIPS, Whetstones and MegaFLOPs have served a similar role in the scientific community.A system's TPS rating indicates not just processor speed, but also IO architecture, operating system, data communications and database software performance. Unfortunately, it does not capture ease-of-use.Work continues on formalizing these benchmarks. At present they are written in English. Ultimately they should be defined by a file generator and a set of programs written in a standard database language such as COBOL-SQL.When a vendor first measures his system against these benchmarks, the results are usually terrible. Both benchmarks are designed to expose generic performance bugs in frequently used transaction processing atoms. For example, the Wisconsin and SCAN benchmarks heavily penalize a system which is slow to read the next record in a file.A system with poor performance on these benchmarks can be analyzed as follows: Most vendors have an “atomic” model of their system which represents each transaction as a collection of atoms. The atoms are the primitives of the system. For example, the SCAN benchmark is represented by most vendors as: SCAN: BEGIN TRANSACTION PERFORM 1000 TIMES READ SEQUENTIAL INSERT SEQUENTIAL COMMIT TRANSACTIONThe atomic weights for, BEGIN, READ SEQUENTIAL, INSERT SEQUENTIAL, and COMMIT are measured for each release. The atomic weight usually consists of CPU instructions, message bytes, and disc IOs for a “typical” call to that operation. These weights can be converted to service times by knowing the speeds and utilizations of the devices (processors, discs, lines) used for the application. The molecular weight and service time of SCAN can then be computed as the sum of the atomic weights.Defining and measuring a system's atoms is valuable. It produces a simple conceptual model of how the system is used. Atomic measurements also expose performance bugs. For example, based on the SCAN benchmark, most systems perform READ SEQUENTIAL in 1000 instructions and with .02 disc IO. If a system uses many more instructions or many more IO then it has a performance problem. Similarly, the DebitCredit transaction typically consumes about 2OOKi (thousand instructions) and five disc IO per transaction. One system is known to use 800Ki and 14 IO per transaction. The vendor could use atomic measurement to find the causes of such poor performance. When such problems are localized to an atom, solutions to the problem readily suggest themselves. So, atomic measurement is useful for performance assurance and performance improvement.Atomic measurement also has a major role in system sizing and in capacity planning. If the customer can describe his application in terms of atoms, then a spreadsheet application can give him an estimate of the CPU, disc and line cost for the application. With substantially more effort (and assumptions) the system's response time can be predicted. With even more effort, a prototype system can be generated and benchmarked from the atomic transaction descriptions. Snapshot [Stewart] and Envision [Envison] are examples of systems which combine atomic modeling, queue modeling, and ultimately benchmarking of real systems generated from the atomic description of the application.