Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths

Authors:
Edith Cohen;Daniel Delling;Fabian Fuchs;Andrew V. Goldberg;Moises Goldszmidt;Renato F. Werneck
Affiliations:
Microsoft Research SVC, Mountain View, CA, USA;Microsoft Research SVC, Mountain View, CA, USA;KIT, Karlsruhe, Germany;Microsoft Research SVC, Mountain View, CA, USA;Microsoft Research SVC, Mountain View, CA, USA;Microsoft Research SVC, Mountain View, CA, USA
Venue:
Proceedings of the first ACM conference on Online social networks
Year:
2013

Citing 20
Cited 0

Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The small-world phenomenon: an algorithmic perspective

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Approximate distance oracles

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Reachability and Distance Queries via 2-Hop Labels

SIAM Journal on Computing
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Spatially-decaying aggregation over a network

Journal of Computer and System Sciences
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Summarizing data using bottom-k sketches

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
A sketch-based distance oracle for web-scale graphs

Proceedings of the third ACM international conference on Web search and data mining
HyperANF: approximating the neighbourhood function of very large graphs on a budget

Proceedings of the 20th international conference on World wide web
A comparison of three algorithms for approximating the distance distribution in real-world graphs

TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
Robustness of social networks: comparative results based on distance distributions

SocInfo'11 Proceedings of the Third international conference on Social informatics
Sparse reliable graph backbones

Information and Computation
How user behavior is related to social affinity

Proceedings of the fifth ACM international conference on Web search and data mining
Hierarchical hub labelings for shortest paths

ESA'12 Proceedings of the 20th Annual European conference on Algorithms
HLDB: location-based services in databases

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Fast exact shortest-path distance queries on large networks by pruned landmark labeling

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity estimation between nodes based on structural properties of graphs is a basic building block used in the analysis of massive networks for diverse purposes such as link prediction, product recommendations, advertisement, collaborative filtering, and community discovery. While local similarity measures, based on properties of immediate neighbors, are easy to compute, those relying on global properties have better recall. Unfortunately, this better quality comes with a computational price tag. Aiming for both accuracy and scalability, we make several contributions. First, we define closeness similarity, a natural measure that compares two nodes based on the similarity of their relations to all other nodes. Second, we show how the all-distances sketch (ADS) node labels, which are efficient to compute, can support the estimation of closeness similarity and shortest-path (SP) distances in logarithmic query time. Third, we propose the randomized edge lengths (REL) technique and define the corresponding REL distance, which captures both path length and path multiplicity and therefore improves over the SP distance as a similarity measure. The REL distance can also be the basis of closeness similarity and can be estimated using SP computation or the ADS labels. We demonstrate the effectiveness of our measures and the accuracy of our estimates through experiments on social networks with up to tens of millions of nodes.