Unifying guilt-by-association approaches: theorems and fast algorithms

  • Authors:
  • Danai Koutra;Tai-You Ke;U. Kang;Duen Horng Chau;Hsing-Kuo Kenneth Pao;Christos Faloutsos

  • Affiliations:
  • School of Computer Science, Carnegie Mellon University;Dept. of Computer Science & Information Engineering, National Taiwan Univ. of Science & Technology;School of Computer Science, Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Dept. of Computer Science & Information Engineering, National Taiwan Univ. of Science & Technology;School of Computer Science, Carnegie Mellon University

  • Venue:
  • ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

If several friends of Smith have committed petty thefts, what would you say about Smith? Most people would not be surprised if Smith is a hardened criminal. Guilt-by-association methods combine weak signals to derive stronger ones, and have been extensively used for anomaly detection and classification in numerous settings (e.g., accounting fraud, cyber-security, calling-card fraud). The focus of this paper is to compare and contrast several very successful, guilt-by-association methods: Random Walk with Restarts, Semi-Supervised Learning, and Belief Propagation (BP). Our main contributions are two-fold: (a) theoretically, we prove that all the methods result in a similar matrix inversion problem; (b) for practical applications, we developed FaBP, a fast algorithm that yields 2× speedup, equal or higher accuracy than BP, and is guaranteed to converge. We demonstrate these benefits using synthetic and real datasets, including YahooWeb, one of the largest graphs ever studied with BP.