Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs

Authors:
Alessandro Epasto;Jon Feldman;Silvio Lattanzi;Stefano Leonardi;Vahab Mirrokni
Affiliations:
Sapienza University of Rome, Rome, Italy;Google Research, New York, NY, USA;Google Research, New York, NY, USA;Sapienza University of Rome, Rome, Italy;Google Research, New York, USA
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 24
Cited 0

Stochastic complementation, uncoupling Markov chains, and the theory of nearly reducible systems

SIAM Review
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Analysis of a very large web search engine query log

ACM SIGIR Forum
Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Updating Markov Chains with an Eye on Google's PageRank

SIAM Journal on Matrix Analysis and Applications
Efficient PageRank approximation via graph aggregation

Information Retrieval
Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value Investment

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network

The VLDB Journal — The International Journal on Very Large Data Bases
Approximation algorithms for co-clustering

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Co-ranking Authors and Documents in a Heterogeneous Network

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
ApproxRank: Estimating Rank for a Subgraph

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A generalized Co-HITS algorithm and its application to bipartite graphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MultiRank: co-ranking for objects and relations in multi-relational data

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning query and document similarities from click-through bipartite graph with metadata

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them. We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories. Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.