CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Authors:
Alex Beutel;Wanhong Xu;Venkatesan Guruswami;Christopher Palow;Christos Faloutsos
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Facebook, Menlo Park, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Facebook, London, England UK;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 19
Cited 0

Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The maximum edge biclique problem is NP-complete

Discrete Applied Mathematics
A needle in a haystack: local one-class optimization

ICML '04 Proceedings of the twenty-first international conference on Machine learning
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Robust one-class clustering using hybrid global and local search

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Scalable Collaborative Filtering Framework Based on Co-Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Netprobe: a fast and scalable system for fraud detection in online auction networks

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Approximation algorithms for co-clustering

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Facebook immune system

Proceedings of the 4th Workshop on Social Network Systems
MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis

ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
EigenSpokes: surprising patterns and scalable community chipping in large graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Network Anomaly Detection Using Co-clustering

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.