Long term data storage issues for situational awareness

  • Authors:
  • John McHugh

  • Affiliations:
  • Dalhousie University and University of North Carolina

  • Venue:
  • Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Network traffic archives are useful for a number of purposes ranging from forensic studies to retrospective studies of the evolution of network traffic characteristics. The sheer volume of data that might be useful, if retained, imposes stresses on data storage and management systems. This is exacerbated by the fact that a substantial portion of network traffic is essentially noise and is interesting primarily at an aggregate level as the archive ages, while the remainder may remain interesting at the packet or flow level for an indefinite period. This paper discusses two cases, high volume scans and very infrequent traffic, where lossy compression may be applied to make substantial reductions in the volume of data retained while minimizing the risk of loosing interesting records. In addition, it discusses data structures, based of space and time efficient hashing methods that can be used to index network data using very large, sparse, index spaces such as those presented by IPv6 or by connection tuples that contain multiple IP addresses, along with service and protocol information.