Session state: beyond soft state

Authors:
Benjamin C. Ling;Emre Kiciman;Armando Fox
Affiliations:
-;-;-
Venue:
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Year:
2004

Citing 23
Cited 16

Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Analysis and simulation of a fair queueing algorithm

SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Generational garbage collection and the radioactive decay model

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A model, analysis, and protocol framework for soft state-based communication

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Lessons from Giant-Scale Services

IEEE Internet Computing
Perfect Failure Detection in Timed Asynchronous Systems

IEEE Transactions on Computers
Hippodrome: Running Circles Around Storage Administration

FAST '02 Proceedings of the Conference on File and Storage Technologies
Finding surprising patterns in a time series database in linear time and space

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Harvest, Yield, and Scalable Tolerant Systems

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
A Methodology for Detection and Estimation of Software Aging

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
A comparison of hard-state and soft-state signaling protocols

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Fail-Stutter Fault Tolerance

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Using performance reflection in systems software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Palimpsest: soft-capacity storage for planetary-scale services

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
FAB: enterprise storage systems on a shoestring

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
The case for a session state storage layer

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Scalable, distributed data structures for internet service construction

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Berkeley DB

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference

FAB: building distributed enterprise disk arrays from commodity components

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Cheap recovery: a key to self-managing state

ACM Transactions on Storage (TOS)
Combining statistical monitoring and predictable recovery for self-management

WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Autonomous recovery in componentized Internet applications

Cluster Computing
J2EE server scalability through EJB replication

Proceedings of the 2006 ACM symposium on Applied computing
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Minimal backups of cryptographic protocol runs

Proceedings of the 6th ACM workshop on Formal methods in security engineering
State considerations in distributed systems

Crossroads
Obtaining resource controllability in service cooperation environments

Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia
Secure resource control in service oriented applications

CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference
Centrifuge: integrated lease management and partitioning for cloud services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Dynamically scaling applications in the cloud

ACM SIGCOMM Computer Communication Review
A study on scalability of services and privacy issues in cloud computing

ICDCIT'12 Proceedings of the 8th international conference on Distributed Computing and Internet Technology
Context-aware resource management for secure end-to-end QoS provision in service oriented applications

Journal of Ambient Intelligence and Smart Environments
Context-aware resource management for secure end-to-end QoS provision in service oriented applications

Journal of Ambient Intelligence and Smart Environments
Pico replication: a high availability framework for middleboxes

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cost and complexity of administration of large systems has come to dominate their total cost of ownership. Stateless and soft-state components, e.g. Web servers or network routers, are easy to manage: capacity can be scaled incrementally by adding more nodes, rebalancing of load after failover is easy, and reactive or proactive ("rolling") reboots can be used to handle transient failures. We show that it is possible to achieve the same ease of management for the state-storage subsystem by subdividing persistent state according to the specific guarantees needed by each type. While other systems [19,17] have addressed persistent-until-deleted state, we describe SSM, a store for a previously unaddressed class of state - user-session state - that exhibits the same manageability properties as stateless nodes while providing firm storage guarantees. Any node can be proactively or reactively rebooted at any time to recover from transient faults, without impacting online performance or losing data. We exploit this simplified manageability by pairing SSM with an application-generic, statistical-anomaly-based framework that detects crashes, hangs, and performance failures, and automatically attempts to recover from them by rebooting faulty nodes. Although the detection techniques generate some false positives, the cost of recovery is so low that the false positives have low impact. We provide microbenchmarks to demonstrate SSM's built-in overload protection, failure management and self-tuning. We benchmark SSM integrated into a production enterprise-scale interactive service to demonstrate that these benefits need not come at the cost of significantly decreased throughput or response time.