Experiences with VI communication for database storage

Authors:
Yuanyuan Zhou;Angelos Bilas;Suresh Jagannathan;Cezary Dubnicki;James F. Philbin;Kai Li
Affiliations:
Emphora Inc., Princeton, NJ;University of Toronto, Toronto, Ontario M5S3G4, Canada;Emphora Inc., Princeton, NJ;Emphora Inc., Princeton, NJ;Emphora Inc., Princeton, NJ;Princeton University, Princeton, NJ
Venue:
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Year:
2002

Citing 14
Cited 24

U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The HP AutoRAID hierarchical storage system

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Design of the TruCluster multicomputer system for the Digital UNIX environment

Digital Technical Journal
File server scaling with network-attached secure disks

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
A cost-effective, high-bandwidth storage architecture

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
UTLB: a mechanism for address translation on network interfaces

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Overview of memory channel network for PCI

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
User-Level Communication in Cluster-Based Servers

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Cheating the I/O bottleneck: network storage with Trapeze/Myrinet

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Database research at the University of Illinois at Urbana-Champaign

ACM SIGMOD Record
miNI: reducing network interface memory requirements with dynamic handle lookup

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Performance measurements of a user-space DAFS server with a database workload

NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Application performance on the Direct Access File System

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Second-Level Buffer Cache Management

IEEE Transactions on Parallel and Distributed Systems
PB-LRU: a self-tuning power aware storage cache replacement algorithm for conserving disk energy

Proceedings of the 18th annual international conference on Supercomputing
Power-Aware Storage Cache Management

IEEE Transactions on Computers
PRESS: A Clustered Server Based on User-Level Communication

IEEE Transactions on Parallel and Distributed Systems
Empirical evaluation of multi-level buffer cache collaboration for storage systems

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
In-Kernel Integration of Operating System and Infiniband Functions for High Performance Computing Clusters: A DSM Example

IEEE Transactions on Parallel and Distributed Systems
High performance support of parallel virtual file system (PVFS2) over Quadrics

Proceedings of the 19th annual international conference on Supercomputing
Hibernator: helping disk arrays sleep through the winter

Proceedings of the twentieth ACM symposium on Operating systems principles
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Design Trade-Offs for User-Level I/O Architectures

IEEE Transactions on Computers
Demotion-based exclusive caching through demote buffering: design and evaluations over different networks

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Efficient remote block-level I/O over an RDMA-capable NIC

Proceedings of the 20th annual international conference on Supercomputing
An SSL Back-End Forwarding Scheme in Cluster-Based Web Servers

IEEE Transactions on Parallel and Distributed Systems
Optimization and bottleneck analysis of network block I/O in commodity storage systems

Proceedings of the 21st annual international conference on Supercomputing
Benefits of high speed interconnects to cluster file systems: a case study with lustre

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable memory registration for high performance networks using helper threads

Proceedings of the 8th ACM International Conference on Computing Frontiers
Providing safe, user space access to fast, solid state disks

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
RXIO: Design and implementation of high performance RDMA-capable GridFTP

Computers and Electrical Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper examines how VI-based interconnects can be used to improve I/O path performance between a database server and the storage subsystem. We design and implement a software layer, DSA, that is layered between the application and VI. DSA takes advantage of specific VI features and deals with many of its shortcomings. We provide and evaluate one kernel-level and two user-level implementations of DSA. These implementations trade transparency and generality for performance at different degrees, and unlike research prototypes are designed to be suitable for real-world deployment. We present detailed measurements using a commercial database management system with both micro-benchmarks and industrial database workloads on a mid-size, 4 CPU, and a large, 32 CPU, database server.Our results show that VI-based interconnects and user-level communication can improve all aspects of the I/O path between the database system and the storage back-end. We also find that to make effective use of VI in I/O intensive environments we need to provide substantial additional functionality than what is currently provided by VI. Finally, new storage APIs that help minimize kernel involvement in the I/O path are needed to fully exploit the benefits of VI-based communication.