An active data-aware cache consistency protocol for highly-scalable data-shipping DBMS architectures

  • Authors:
  • Keqiang Wu;Peng-fei Chuang;David J. Lilja

  • Affiliations:
  • University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN

  • Venue:
  • Proceedings of the 1st conference on Computing frontiers
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a data-shipping database system, data items are retrieved from the server machines, cached and processed at the client machines, and then shipped back to the server. Current cache consistency approaches typically rely on a centralized server or servers to enforce the necessary concurrency control actions. This centralized server imposes a limitation on the scalability and performance of these systems. This paper presents a new consistency protocol, Active Data-aware Cache Consistency (ADCC), that allows clients to be aware of the global state of their cached data via a two-tier directory. Using parallel communication with simultaneous client-server and client-client messages, ADCC reduces the network latency for detecting data conflicts by 50%, while increasing message overhead by about 8% only. In addition, ADCC improves scalability by partially offloading the concurrency control function from the server to the clients. An optimization, Lazy Update, is introduced to reduce the message overhead for maintaining client directory consistency. We implement ADCC in a page server DBMS architecture and compare it with the leading cache consistency algorithm, Callback Locking (CBL), which is the most widely implemented algorithm in commercial DBMSs. Our performance study shows that ADCC has a similar or lower abort rate, higher throughput, and better scalability for important workloads and system configurations. Both the simulation results and the analytic study indicate that the message overhead is low and that ADCC produces better behavior compared to the traditional server-based communication under high contention workloads.