Resolving user identities over social networks through supervised learning and rich similarity features

Authors:
André Nunes;Pável Calado;Bruno Martins
Affiliations:
INESC-ID, Porto Salvo, Portugal;INESC-ID, Porto Salvo, Portugal;INESC-ID, Porto Salvo, Portugal
Venue:
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Year:
2012

Citing 1
Cited 1

Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering

What's in a name?: an unsupervised approach to link users across communities

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an approach for resolving user identifiers in the context of social networks, using techniques from the area of duplicate record detection [1]. We reduce the user identity resolution problem into a binary classification task, where the goal is to classify pairs of identifiers as either belonging to the same person or not. The pairs are represented as feature vectors that combine multiple sources of similarity (e.g. similarity between profile information, descriptions of people's interests, and people's friend lists). We report on a thorough evaluation of different machine learning algorithms and different feature sets, concluding that user identities can be resolved with high accuracy.