Digital Image Processing
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Proceedings of the 13th international conference on World Wide Web
You are what you say: privacy risks of public mentions
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the 16th international conference on World Wide Web
Can pseudonymity really guarantee privacy?
SSYM'00 Proceedings of the 9th conference on USENIX Security Symposium - Volume 9
Web People Search via Connection Analysis
IEEE Transactions on Knowledge and Data Engineering
Understanding the value of features for coreference resolution
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Framework for Computing the Privacy Scores of Users in Online Social Networks
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Myths and fallacies of "Personally Identifiable Information"
Communications of the ACM
An overview of Microsoft web N-gram corpus and applications
HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
End-to-end coreference resolution via hypergraph partitioning
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Web scale NLP: a case study on url word breaking
Proceedings of the 20th international conference on World wide web
Competition-based user expertise score estimation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Combining machine learning and human judgment in author disambiguation
Proceedings of the 20th ACM international conference on Information and knowledge management
Interweaving public user profiles on the web
UMAP'10 Proceedings of the 18th international conference on User Modeling, Adaptation, and Personalization
Proceedings of the 27th Annual ACM Symposium on Applied Computing
An unsupervised method for author extraction from web pages containing user-generated content
Proceedings of the 21st ACM international conference on Information and knowledge management
Studying User Footprints in Different Online Social Networks
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
We know how you live: exploring the spectrum of urban lifestyles
Proceedings of the first ACM conference on Online social networks
Hi-index | 0.00 |
In this paper, we consider the problem of linking users across multiple online communities. Specifically, we focus on the alias-disambiguation step of this user linking task, which is meant to differentiate users with the same usernames. We start quantitatively analyzing the importance of the alias-disambiguation step by conducting a survey on 153 volunteers and an experimental analysis on a large dataset of About.me (75,472 users). The analysis shows that the alias-disambiguation solution can address a major part of the user linking problem in terms of the coverage of true pairwise decisions (46.8%). To the best of our knowledge, this is the first study on human behaviors with regards to the usages of online usernames. We then cast the alias-disambiguation step as a pairwise classification problem and propose a novel unsupervised approach. The key idea of our approach is to automatically label training instances based on two observations: (a) rare usernames are likely owned by a single natural person, e.g. pennystar88 as a positive instance; (b) common usernames are likely owned by different natural persons, e.g. tank as a negative instance. We propose using the n-gram probabilities of usernames to estimate the rareness or commonness of usernames. Moreover, these two observations are verified by using the dataset of Yahoo! Answers. The empirical evaluations on 53 forums verify: (a) the effectiveness of the classifiers with the automatically generated training data and (b) that the rareness and commonness of usernames can help user linking. We also analyze the cases where the classifiers fail.