Scalable proximity estimation and link prediction in online social networks

  • Authors:
  • Han Hee Song;Tae Won Cho;Vacha Dave;Yin Zhang;Lili Qiu

  • Affiliations:
  • The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA

  • Venue:
  • Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Proximity measures quantify the closeness or similarity between nodes in a social network and form the basis of a range of applications in social sciences, business, information technology, computer networks, and cyber security. It is challenging to estimate proximity measures in online social networks due to their massive scale (with millions of users) and dynamic nature (with hundreds of thousands of new nodes and millions of edges added daily). To address this challenge, we develop two novel methods to efficiently and accurately approximate a large family of proximity measures. We also propose a novel incremental update algorithm to enable near real-time proximity estimation in highly dynamic social networks. Evaluation based on a large amount of real data collected in five popular online social networks shows that our methods are accurate and can easily scale to networks with millions of nodes. To demonstrate the practical values of our techniques, we consider a significant application of proximity estimation: link prediction, i.e., predicting which new edges will be added in the near future based on past snapshots of a social network. Our results reveal that (i) the effectiveness of different proximity measures for link prediction varies significantly across different online social networks and depends heavily on the fraction of edges contributed by the highest degree nodes, and (ii) combining multiple proximity measures consistently yields the best link prediction accuracy.