Consistency in a partitioned network: a survey
ACM Computing Surveys (CSUR)
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Disconnected operation in the Coda File System
ACM Transactions on Computer Systems (TOCS)
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
Frangipani: a scalable distributed file system
Proceedings of the sixteenth ACM symposium on Operating systems principles
Proceedings of the seventeenth ACM symposium on Operating systems principles
A Majority consensus approach to concurrency control for multiple copy databases
ACM Transactions on Database Systems (TODS)
The Ninja architecture for robust Internet-scale systems and services373423
Computer Networks: The International Journal of Computer and Telecommunications Networking - pervasive computing
Session guarantees for weakly consistent replicated data
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Lessons from Giant-Scale Services
IEEE Internet Computing
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Finding surprising patterns in a time series database in linear time and space
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted voting for replicated data
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Secure and Scalable Replication in Phalanx
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
A Methodology for Detection and Estimation of Software Aging
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Queue - Storage
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Self-adjusting quorum systems for byzantine fault tolerance
Self-adjusting quorum systems for byzantine fault tolerance
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
RepStore: A Self-Managing and Self-Tuning Storage Backend with Smart Bricks
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
FAB: enterprise storage systems on a shoestring
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Session state: beyond soft state
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Design and evaluation of a continuous consistency model for replicated services
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Scalable, distributed data structures for internet service construction
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Towards autonomic computing: a new self-management method
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I
SEERDIS: a DHT-based resource indexing and discovery scheme for the data center
Proceedings of the 19th High Performance Computing Symposia
Hi-index | 0.00 |
Cluster hash tables (CHTs) are key components of many large-scale Internet services due to their highly-scalable performance and the prevalence of the type of data they store. Another advantage of CHTs is that they can be designed to be as self-managing as a cluster of stateless servers. One key to achieving this extreme manageability is reboot-based recovery that is predictably fast and has modest impact on system performance and availability. This "cheap" recovery mechanism simplifies management in two ways. First, it simplifies failure detection by lowering the cost of acting on false positives. This enables one to use statistical techniques to turn hard-to-catch failures, such as node degradation, into failure, followed by recovery. Second, cheap recovery simplifies capacity planning by recasting repartitioning as failure plus recovery to achieve zero-downtime incremental scaling. These low-cost recovery and scaling mechanisms make it possible for the system to be continuously self-adjusting, a key property of self-managing systems.