OFRewind: enabling record and replay troubleshooting for networks

  • Authors:
  • Andreas Wundsam;Dan Levin;Srini Seetharaman;Anja Feldmann

  • Affiliations:
  • Deutsche Telekom Laboratories , TU Berlin;Deutsche Telekom Laboratories , TU Berlin;Deutsche Telekom Inc., R&D Lab;Deutsche Telekom Laboratories , TU Berlin

  • Venue:
  • USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Debugging operational networks can be a daunting task, due to their size, distributed state, and the presence of black box components such as commercial routers and switches, which are poorly instrumentable and only coarsely configurable. The debugging tool set available to administrators is limited, and provides only aggregated statistics (SNMP), sampled data (NetFlow/sFlow), or local measurements on single hosts (tcpdump). In this paper, we leverage split forwarding architectures such as OpenFlow to add record and replay debugging capabilities to networks - a powerful, yet currently lacking approach. We present the design of OFRewind, which enables scalable, multi-granularity, temporally consistent recording and coordinated replay in a network, with fine-grained, dynamic, centrally orchestrated control over both record and replay. Thus, OFRewind helps operators to reproduce software errors, identify datapath limitations, or locate configuration errors.