Resolving user identities over social networks through supervised learning and rich similarity features

  • Authors:
  • André Nunes;Pável Calado;Bruno Martins

  • Affiliations:
  • INESC-ID, Porto Salvo, Portugal;INESC-ID, Porto Salvo, Portugal;INESC-ID, Porto Salvo, Portugal

  • Venue:
  • Proceedings of the 27th Annual ACM Symposium on Applied Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an approach for resolving user identifiers in the context of social networks, using techniques from the area of duplicate record detection [1]. We reduce the user identity resolution problem into a binary classification task, where the goal is to classify pairs of identifiers as either belonging to the same person or not. The pairs are represented as feature vectors that combine multiple sources of similarity (e.g. similarity between profile information, descriptions of people's interests, and people's friend lists). We report on a thorough evaluation of different machine learning algorithms and different feature sets, concluding that user identities can be resolved with high accuracy.