Measurements of a distributed file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Analysis of file I/O traces in commercial computing environments
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Self-similarity in file systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
Password authentication with insecure communication
Communications of the ACM
A relational model of data for large shared data banks
Communications of the ACM
Query optimization in compressed database systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information and control in gray-box systems
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Understanding BGP misconfiguration
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Characteristics of I/O Traffic in Personal Computer and Server
Characteristics of I/O Traffic in Personal Computer and Server
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
VPC3: a fast and effective trace-compression algorithm
Proceedings of the joint international conference on Measurement and modeling of computer systems
Characteristics of I/O traffic in personal computer and server workloads
IBM Systems Journal
Shield: vulnerability-driven network filters for preventing known vulnerability exploits
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support
LISA '03 Proceedings of the 17th USENIX conference on System administration
Gatekeeper: Monitoring Auto-Start Extensibility Points (ASEPs) for Spyware Management
LISA '04 Proceedings of the 18th USENIX conference on System administration
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Forensix: A Robust, High-Performance Reconstruction System
ICDCSW '05 Proceedings of the Second International Workshop on Security in Distributed Computing Systems (SDCS) (ICDCSW'05) - Volume 02
Towards a Self-Managing Software Patching Process Using Black-Box Persistent-State Manifests
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Metadata Efficiency in Versioning File Systems
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Passive NFS Tracing of Email and Research Workloads
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Wayback: a user-level versioning file system for linux
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic misconfiguration troubleshooting with peerpressure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Reconstructing system state for intrusion analysis
ACM SIGOPS Operating Systems Review
Automatic software fault diagnosis by exploiting application signatures
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Practical performance models for complex, popular applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proposal and evaluation of data reduction method for tracing based pre-patch impact analysis
APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Refactoring human roles solves systems problems
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Homogeneity as an advantage: it takes a community to protect an application
CollSec'10 Proceedings of the 2010 international conference on Collaborative methods for security and privacy
Repair from a chair: computer repair as an untrusted cloud service
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Safe side effects commitment for OS-level virtualization
Proceedings of the 8th ACM international conference on Autonomic computing
Proceedings of the 6th International Systems and Storage Conference
Hi-index | 0.00 |
Mismanagement of the persistent state of a system---all the executable files, configuration settings and other data that govern how a system functions---causes reliability problems, security vulnerabilities, and drives up operation costs. Recent research traces persistent state interactions---how state is read, modified, etc.---to help troubleshooting, change management and malware mitigation, but has been limited by the difficulty of collecting, storing, and analyzing the 10s to 100s of millions of daily events that occur on a single machine, much less the 1000s or more machines in many computing environments. We present the Flight Data Recorder (FDR) that enables always-on tracing, storage and analysis of persistent state interactions. FDR uses a domain-specific log format, tailored to observed file system workloads and common systems management queries. Our lossless log format compresses logs to only 0.5--0.9 bytes per interaction. In this log format, 1000 machine-days of logs---over 25 billion events---can be analyzed in less than 30 minutes. We report on our deployment of FDR to 207 production machines at MSN, and show that a single centralized collection machine can potentially scale to collecting and analyzing the complete records of persistent state interactions from 4000+ machines. Furthermore, our tracing technology is shipping as part of the Windows Vista OS.