Estimating the number of users behind ip addresses for combating abusive traffic

Authors:
Ahmed Metwally;Matt Paduano
Affiliations:
Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 13
Cited 4

Updating mean and variance estimates: an improved method

Communications of the ACM
A technique for counting natted hosts

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Remote Physical Device Fingerprinting

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Hot or not: revealing hidden services by their clock skew

Proceedings of the 13th ACM conference on Computer and communications security
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Detectives: detecting coalition hit inflation attacks in advertising networks streams

Proceedings of the 16th international conference on World Wide Web
Geographic locality of IP prefixes

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
How dynamic are IP addresses?

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Filtering spam with behavioral blacklisting

Proceedings of the 14th ACM conference on Computer and communications security
Spamscatter: characterizing internet scam hosting infrastructure

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Characterizing botnets from email spam records

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting

Proceedings of the VLDB Endowment
Peering through the shroud: the effect of edge opacity on ip-based client identification

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors

Proceedings of the VLDB Endowment
Populated IP addresses: classification and applications

Proceedings of the 2012 ACM conference on Computer and communications security
Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse

SEC'13 Proceedings of the 22nd USENIX conference on Security
Hyperlocal: inferring location of IP addresses in real-time bid requests for mobile ads

Proceedings of the 6th ACM SIGSPATIAL International Workshop on Location-Based Social Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses estimating the number of the users of a specific application behind IP addresses (IPs). This problem is central to combating abusive traffic, such as DDoS attacks, ad click fraud and email spam. We share our experience building a general framework at Google for estimating the number of users behind IPs, called hereinafter the sizes of the IPs. The primary goal of this framework is combating abusive traffic without violating the user privacy. The estimation techniques produce statistically sound estimates of sizes relying solely on passively mining aggregated application log data, without probing machines or deploying active content like Java applets. This paper also explores using the estimated sizes to detect and filter abusive traffic. The proposed framework was used to build and deploy an ad click fraud filter at Google. The first 50M clicks tagged by the filter had a significant recall of all tagged clicks, and their false positive rate was below 1.4%. For the sake of comparison, we simulated a naive IP-based filter that does not consider the sizes of the IPs. To reach a comparable recall, the naive filter's false positive rate was 37% due to aggressive tagging.