Unifying guilt-by-association approaches: theorems and fast algorithms

Authors:
Danai Koutra;Tai-You Ke;U. Kang;Duen Horng Chau;Hsing-Kuo Kenneth Pao;Christos Faloutsos
Affiliations:
School of Computer Science, Carnegie Mellon University;Dept. of Computer Science & Information Engineering, National Taiwan Univ. of Science & Technology;School of Computer Science, Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Dept. of Computer Science & Information Engineering, National Taiwan Univ. of Science & Technology;School of Computer Science, Carnegie Mellon University
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Year:
2011

Citing 18
Cited 3

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Understanding belief propagation and its generalizations

Exploring artificial intelligence in the new millennium
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
GCap: Graph-based Automatic Image Captioning

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 9 - Volume 09
Correctness of Local Probability Propagation in Graphical Models with Loops

Neural Computation
Efficient Belief Propagation for Early Vision

International Journal of Computer Vision
Measuring and extracting proximity in networks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Netprobe: a fast and scalable system for fraud detection in online auction networks

Proceedings of the 16th international conference on World Wide Web
Walk-Sums and Belief Propagation in Gaussian Graphical Models

The Journal of Machine Learning Research
Learning to rank typed graph walks: local and global approaches

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
SNARE: a link analytic system for graph labeling and risk detection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Graph regularized transductive classification on heterogeneous information networks

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Mining large graphs: Algorithms, inference, and discoveries

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory
Constructing free-energy approximations and generalized belief propagation algorithms

IEEE Transactions on Information Theory

Top-N recommendation through belief propagation

Proceedings of the 21st ACM international conference on Information and knowledge management
Maximum consistency preferential random walks

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A few good predictions: selective node labeling in a social network

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

If several friends of Smith have committed petty thefts, what would you say about Smith? Most people would not be surprised if Smith is a hardened criminal. Guilt-by-association methods combine weak signals to derive stronger ones, and have been extensively used for anomaly detection and classification in numerous settings (e.g., accounting fraud, cyber-security, calling-card fraud). The focus of this paper is to compare and contrast several very successful, guilt-by-association methods: Random Walk with Restarts, Semi-Supervised Learning, and Belief Propagation (BP). Our main contributions are two-fold: (a) theoretically, we prove that all the methods result in a similar matrix inversion problem; (b) for practical applications, we developed FaBP, a fast algorithm that yields 2× speedup, equal or higher accuracy than BP, and is guaranteed to converge. We demonstrate these benefits using synthetic and real datasets, including YahooWeb, one of the largest graphs ever studied with BP.