Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Yes, there is a correlation: - from social networks to personal behavior on the web
Proceedings of the 17th international conference on World Wide Web
Prediction promotes privacy in dynamic social networks
WOSN'10 Proceedings of the 3rd conference on Online social networks
You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Detecting and characterizing social spam campaigns
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Data Leak Prevention through Named Entity Recognition
SOCIALCOM '10 Proceedings of the 2010 IEEE Second International Conference on Social Computing
"I regretted the minute I pressed share": a qualitative study of regrets on Facebook
Proceedings of the Seventh Symposium on Usable Privacy and Security
Imagined communities: awareness, information sharing, and privacy on the facebook
PET'06 Proceedings of the 6th international conference on Privacy Enhancing Technologies
The complete picture of the Twitter social graph
Proceedings of the 2012 ACM conference on CoNEXT student workshop
Location tracking via social networking sites
Proceedings of the 5th Annual ACM Web Science Conference
The post anachronism: the temporal dimension of facebook privacy
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Privacy awareness about information leakage: who knows what about me?
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Hi-index | 0.00 |
Twitter has become one of the most popular microblogging sites for people to broadcast (or "tweet") their thoughts to the world in 140 characters or less. Since these messages are available for public consumption, one may expect these tweets not to contain private or incriminating information. Nevertheless we observe a large number of users who unwittingly post sensitive information about themselves and other people for whom there may be negative consequences. While some awareness exists of such privacy issues on social networks such as Twitter and Facebook, there has been no quantitative, scientific study addressing this problem. In this paper we make three major contributions. First, we characterize the nature of privacy leaks on Twitter to gain an understanding of what types of private information people are revealing on it. We specifically analyze three types of leaks: divulging vacation plans, tweeting under the influence of alcohol, and revealing medical conditions. Second, using this characterization we build automatic classifiers to detect incriminating tweets for these three topics in real time in order to demonstrate the real threat posed to users by, e.g., burglars and law enforcement. Third, we characterize who leaks information and how. We study both self- incriminating primary leaks and secondary leaks that reveal sensitive information about others, as well as the prevalence of leaks in status updates and conversation tweets. We also conduct a cross-cultural study to investigate the prevalence of leaks in tweets originating from the United States, United Kingdom and Singapore. Finally, we discuss how our classification system can be used as a defense mechanism to alert users of potential privacy leaks.