Fast checkpoint recovery algorithms for frequently consistent applications

Authors:
Tuan Cao;Marcos Vaz Salles;Benjamin Sowell;Yao Yue;Alan Demers;Johannes Gehrke;Walker White
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 23
Cited 6

A Crash Recovery Scheme for a Memory-Resident Database System

IEEE Transactions on Computers
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Application level fault tolerance in heterogeneous networks of workstations

Journal of Parallel and Distributed Computing
Dynamic database dumping

SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Checkpointing Memory-Resident Databases

Proceedings of the Fifth International Conference on Data Engineering
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
The ClustRa Telecom Database: High Availability, High Throughput, and Real-Time Response

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Application-level checkpointing for shared memory programs

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
An integrated approach to recovery and high availability in an updatable, distributed data warehouse

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling games to epic proportions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Libckpt: transparent checkpointing under Unix

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Scalability for Virtual Worlds

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Predictable performance for unpredictable workloads

Proceedings of the VLDB Endowment
Low overhead concurrency control for partitioned main memory databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The case for determinism in database systems

Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce

Proceedings of the VLDB Endowment

BRRL: a recovery library for main-memory applications in the cloud

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
How to efficiently snapshot transactional data: hardware or software controlled?

Proceedings of the Seventh International Workshop on Data Management on New Hardware
Calvin: fast distributed transactions for partitioned database systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hathi: durable transactions for memory using flash

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Compacting transactional data in hybrid OLTP&OLAP databases

Proceedings of the VLDB Endowment
Efficient logging for enterprise workloads on column-oriented in-memory databases

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in hardware have enabled many long-running applications to execute entirely in main memory. As a result, these applications have increasingly turned to database techniques to ensure durability in the event of a crash. However, many of these applications, such as massively multiplayer online games and main-memory OLTP systems, must sustain extremely high update rates - often hundreds of thousands of updates per second. Providing durability for these applications without introducing excessive overhead or latency spikes remains a challenge for application developers. In this paper, we take advantage of frequent points of consistency in many of these applications to develop novel checkpoint recovery algorithms that trade additional space in main memory for significantly lower overhead and latency. Compared to previous work, our new algorithms do not require any locking or bulk copies of the application state. Our experimental evaluation shows that one of our new algorithms attains nearly constant latency and reduces overhead by more than an order of magnitude for low to medium update rates. Additionally, in a heavily loaded main-memory transaction processing system, it still reduces overhead by more than a factor of two.