On the theory of system administration
Science of Computer Programming
Automated System Monitoring and Notification With Swatch
LISA '93 Proceedings of the 7th USENIX conference on System administration
Process Monitor: Detecting Events That Didn't Happen
LISA '02 Proceedings of the 16th USENIX conference on System administration
Refereed Papers: Real-time Log File Analysis Using the Simple Event Correlator (SEC)
LISA '04 Proceedings of the 18th USENIX conference on System administration
Provenance for system troubleshooting
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Efficient multidimensional aggregation for large scale monitoring
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Hi-index | 0.00 |
System administrators have utilized log analysis for decades to monitor and automate their environments. As compute environments grow, and the scope and volume of the logs increase, it becomes more difficult to get timely, useful data and appropriate triggers for enabling automation using traditional tools like Swatch. Cloud computing is intensifying this problem as the number of systems in datacenters increases dramatically. To address these problems at AMD, we developed a tool we call the Variable Temporal Event Correlator, or VTEC. VTEC has unique design features, such as inherent multi-threaded/multi-process design, a flexible and extensible programming interface, built-in job queuing, and a novel method for storing and describing temporal information about events, that well suit it for quickly and efficiently handling a broad range of event correlation tasks in real-time. These features also enable VTEC to scale to tens of gigabytes of log data processed per day. This paper describes the architecture, use, and efficacy of this tool, which has been in production at AMD for more than four years.