Semi-supervised ranking on very large graphs with rich metadata

Authors:
Bin Gao;Tie-Yan Liu;Wei Wei;Taifeng Wang;Hang Li
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Huazhong University of Science and Technology, Wuhan, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 19
Cited 11

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Learning to Create Customized Authority Lists

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Adaptive ranking of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
Ranking on graph data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning to rank networked entities

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A reference collection for web spam

ACM SIGIR Forum
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Learning random walks to rank nodes in graphs

Proceedings of the 24th international conference on Machine learning
Graph-Based Semisupervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Web spam identification through content and hyperlinks

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Ranking and semi-supervised classification on large scale graphs using map-reduce

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Semi-supervised graph-ranking for text retrieval

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Supervised random walks: predicting and recommending links in social networks

Proceedings of the fourth ACM international conference on Web search and data mining
Learning parameters in entity relationship graphs from ranking preferences

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal
Efficient personalized pagerank with accuracy assurance

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Magnet community identification on social networks

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel field ranking

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating heterogeneous information for personalized tag recommendation in social tagging systems

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-scale graph mining and learning for information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization

Information Sciences: an International Journal
Homophily, popularity and randomness: modelling growth of online social network

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Graph-based malware distributors detection

Proceedings of the 22nd international conference on World Wide Web companion
An agent-based model of the development of friendship links within Facebook

Computational & Mathematical Organization Theory
Parallel Field Ranking

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph ranking plays an important role in many applications, such as page ranking on web graphs and entity ranking on social networks. In applications, besides graph structure, rich information on nodes and edges and explicit or implicit human supervision are often available. In contrast, conventional algorithms (e.g., PageRank and HITS) compute ranking scores by only resorting to graph structure information. A natural question arises here, that is, how to effectively and efficiently leverage all the information to more accurately calculate graph ranking scores than the conventional algorithms, assuming that the graph is also very large. Previous work only partially tackled the problem, and the proposed solutions are also not satisfying. This paper addresses the problem and proposes a general framework as well as an efficient algorithm for graph ranking. Specifically, we define a semi-supervised learning framework for ranking of nodes on a very large graph and derive within our proposed framework an efficient algorithm called Semi-Supervised PageRank. In the algorithm, the objective function is defined based upon a Markov random walk on the graph. The transition probability and the reset probability of the Markov model are defined as parametric models based on features on nodes and edges. By minimizing the objective function, subject to a number of constraints derived from supervision information, we simultaneously learn the optimal parameters of the model and the optimal ranking scores of the nodes. Finally, we show that it is possible to make the algorithm efficient to handle a billion-node graph by taking advantage of the sparsity of the graph and implement it in the MapReduce logic. Experiments on real data from a commercial search engine show that the proposed algorithm can outperform previous algorithms on several tasks.