Poster: CUD: crowdsourcing for URL spam detection

Authors:
Jun Hu;Hongyu Gao;Zhichun Li;Yan Chen
Affiliations:
Huazhong University of Science and Technology, Wuhan, China;Northwestern University, Evanston, IL, USA;NEC Research Labs, Princeton, NJ, USA;Northwestern University, Evanston, IL, USA
Venue:
Proceedings of the 18th ACM conference on Computer and communications security
Year:
2011

Citing 3
Cited 0

Sentiment analysis: capturing favorability using natural language processing

Proceedings of the 2nd international conference on Knowledge capture
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Design and Evaluation of a Real-Time URL Spam Filtering Service

SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recentmachine learning based URL classification approaches demonstrate good accuracy and reasonable throughput, they are based on observations fromexisting spamURLs and hard to detect new spam URLs when attackers employ new strategies. In this paper, we present CUD (Crowdsourcing for URL spam detection) as a supplement of existing detection tools. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spamURLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches. Therefore, CUD can be used as a good complement to other approaches.