Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Variational Methods for Graphical Models
Machine Learning
The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
ICML '06 Proceedings of the 23rd international conference on Machine learning
An inquiry into the nature and causes of the wealth of internet miscreants
Proceedings of the 14th ACM conference on Computer and communications security
Numerical Recipes 3rd Edition: The Art of Scientific Computing
Numerical Recipes 3rd Edition: The Art of Scientific Computing
Connections between the lines: augmenting social networks with text
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic and role discovery in social networks with experiments on enron and academic email
Journal of Artificial Intelligence Research
Analyzing the Amazon Mechanical Turk marketplace
XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
Re: CAPTCHAs: understanding CAPTCHA-solving services in an economic context
USENIX Security'10 Proceedings of the 19th USENIX conference on Security
LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats
Dirty jobs: the role of freelance labor in web service abuse
SEC'11 Proceedings of the 20th USENIX conference on Security
Hi-index | 0.00 |
Web services such as Google, Facebook, and Twitter are recurring victims of abuse, and their plight will only worsen as more attackers are drawn to their large user bases. Many attackers hire cheap, human labor to actualize their schemes, connecting with potential workers via crowdsourcing and freelancing sites such as Mechanical Turk and Freelancer.com. To identify solicitations for abuse jobs, these Web sites need ways to distinguish these tasks from ordinary jobs. In this paper, we show how to discover clusters of abuse tasks using latent Dirichlet allocation (LDA), an unsupervised method for topic modeling in large corpora of text. Applying LDA to hundreds of thousands of unlabeled job postings from Freelancer.com, we find that it discovers clusters of related abuse jobs and identifies the prevalent words that distinguish them. Finally, we use the clusters from LDA to profile the population of workers who bid on abuse jobs and the population of buyers who post their project descriptions.