Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Analysis of a very large web search engine query log
ACM SIGIR Forum
Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 11th international conference on World Wide Web
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Updating Markov Chains with an Eye on Google's PageRank
SIAM Journal on Matrix Analysis and Applications
Efficient PageRank approximation via graph aggregation
Information Retrieval
Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value Investment
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The link-prediction problem for social networks
Journal of the American Society for Information Science and Technology
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
The VLDB Journal — The International Journal on Very Large Data Bases
Approximation algorithms for co-clustering
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Co-ranking Authors and Documents in a Heterogeneous Network
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Query suggestion using hitting time
Proceedings of the 17th ACM conference on Information and knowledge management
ApproxRank: Estimating Rank for a Subgraph
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A generalized Co-HITS algorithm and its application to bipartite graphs
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MultiRank: co-ranking for objects and relations in multi-relational data
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning query and document similarities from click-through bipartite graph with metadata
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them. We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories. Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.