Empirical comparison of IP reputation databases

  • Authors:
  • Jernej Porenta;Mojca Ciglarič

  • Affiliations:
  • Academic and Research Network of Slovenia;University of Ljubljana

  • Venue:
  • Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

IP reputation is a common technique to address email spam problem and while there are commercial implementations available, the algorithms behind them are confidential. A few open source implementations (gossip, RepuScore, IP-GroupREP, etc.) are available, but few studies compare their commercial counterparts. For this reason, we have made an empirical comparison of six popular commercial IP reputation databases and three different open-source IP reputation algorithms. We built our own IP reputation database from our email corpus, containing 931,576 email messages from real-time email traffic at an academic ISP. After we processed and classified the corpus, we compared the open-source IP reputation algorithms' results with commercial IP reputation databases by using the Spearman rank correlation coefficient to identify the optimal parameters for open-source algorithms. The results show lower correlation coefficients when the frequency of emails from a single IP is rising. Open-source algorithms performed sufficiently for IP numbers with more than five and less than 50 emails from a single IP, while (surprisingly) the correlation dropped with a higher number of emails from a single IP. For this reason, we believe there should be some additional fine-tuning of open-source algorithms to make them comparable to their commercial counterparts that have IP reputation scores built from many sensors around the world. We also compared commercial IP reputation databases and found mixed correlations between them, which raised many questions regarding the algorithms used for building IP reputation scores. The research also identified the problem of finding a good methodology for comparing IP reputation databases.