Preprocessing DNS log data for effective data mining

  • Authors:
  • Mark E. Snyder;Ravi Sundaram;Mayur Thakur

  • Affiliations:
  • Department of Computer Science, Missouri S&T, Rolla, MO;Department of Computer and Information Science, Northeastern University, Boston, MA;Google Inc., Mountain View, CA

  • Venue:
  • ICC'09 Proceedings of the 2009 IEEE international conference on Communications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Domain Name Service (DNS) provides a critical function in directing Internet traffic. Defending DNS servers from bandwidth attacks is assisted by the ability to effectively mine DNS log data for statistical patterns. Processing DNS log data can be classified as a data-intensive problem, and as such presents challenges unique to this class of problem. When problems occur in capturing log data, or when the DNS server experiences an outage (scheduled or unscheduled), the normal pattern of traffic for that server becomes clouded. Simple linear interpolation of the holes in the data does not preserve features such as peaks in traffic (which can occur during an attack, making them of particular interest). We demonstrate a method for estimating values for missing portions of time sensitive DNS log data. This method would be suitable for use with a variety of datasets containing time series values where certain portions are missing.