Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
What's in a name?: an unsupervised approach to link users across communities
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
This paper describes an approach for resolving user identifiers in the context of social networks, using techniques from the area of duplicate record detection [1]. We reduce the user identity resolution problem into a binary classification task, where the goal is to classify pairs of identifiers as either belonging to the same person or not. The pairs are represented as feature vectors that combine multiple sources of similarity (e.g. similarity between profile information, descriptions of people's interests, and people's friend lists). We report on a thorough evaluation of different machine learning algorithms and different feature sets, concluding that user identities can be resolved with high accuracy.