A Crash Recovery Scheme for a Memory-Resident Database System
IEEE Transactions on Computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Database Systems (TODS)
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Application level fault tolerance in heterogeneous networks of workstations
Journal of Parallel and Distributed Computing
SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Checkpointing Memory-Resident Databases
Proceedings of the Fifth International Conference on Data Engineering
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
The ClustRa Telecom Database: High Availability, High Throughput, and Real-Time Response
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
An integrated approach to recovery and high availability in an updatable, distributed data warehouse
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling games to epic proportions
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The end of an architectural era: (it's time for a complete rewrite)
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Scalability for Virtual Worlds
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Predictable performance for unpredictable workloads
Proceedings of the VLDB Endowment
Low overhead concurrency control for partitioned main memory databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The case for determinism in database systems
Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce
Proceedings of the VLDB Endowment
BRRL: a recovery library for main-memory applications in the cloud
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
How to efficiently snapshot transactional data: hardware or software controlled?
Proceedings of the Seventh International Workshop on Data Management on New Hardware
Calvin: fast distributed transactions for partitioned database systems
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hathi: durable transactions for memory using flash
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Compacting transactional data in hybrid OLTP&OLAP databases
Proceedings of the VLDB Endowment
Efficient logging for enterprise workloads on column-oriented in-memory databases
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Advances in hardware have enabled many long-running applications to execute entirely in main memory. As a result, these applications have increasingly turned to database techniques to ensure durability in the event of a crash. However, many of these applications, such as massively multiplayer online games and main-memory OLTP systems, must sustain extremely high update rates - often hundreds of thousands of updates per second. Providing durability for these applications without introducing excessive overhead or latency spikes remains a challenge for application developers. In this paper, we take advantage of frequent points of consistency in many of these applications to develop novel checkpoint recovery algorithms that trade additional space in main memory for significantly lower overhead and latency. Compared to previous work, our new algorithms do not require any locking or bulk copies of the application state. Our experimental evaluation shows that one of our new algorithms attains nearly constant latency and reduces overhead by more than an order of magnitude for low to medium update rates. Additionally, in a heavily loaded main-memory transaction processing system, it still reduces overhead by more than a factor of two.