Social network analysis of web links to eliminate false positives in collaborative anti-spam systems

Authors:
Zac Sadan;David G. Schwartz
Affiliations:
Graduate School of Business Administration, Bar-Ilan University, Ramat Gan, Israel;Graduate School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
Venue:
Journal of Network and Computer Applications
Year:
2011

Citing 12
Cited 0

Leveraging Social Networks to Fight Spam

Computer
Combining email models for false positive reduction

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
MailRank: using ranking for spam detection

Proceedings of the 14th ACM international conference on Information and knowledge management
Collaborative Spam Filtering Using E-Mail Networks

Computer
TTSF: A Novel Two-Tier Spam Filter

PDCAT '06 Proceedings of the Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies
Spam and the ongoing battle for the inbox

Communications of the ACM - Spam and the ongoing battle for the inbox
Online supervised spam filter evaluation

ACM Transactions on Information Systems (TOIS)
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the properties of spam-advertised URL addresses

Journal of Network and Computer Applications
Trusting spam reporters: A reporter-based reputation system for email filtering

ACM Transactions on Information Systems (TOIS)
K-path centrality: a new centrality measure in social networks

Proceedings of the 4th Workshop on Social Network Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of today's email anti-spam systems is primarily measured by the percentage of false positives (non-spam messages detected as spam) rather than by the percentage of false negatives (real spam messages left unblocked). One reliable anti-spam technique is the Universal Resource Locator (URL)-based filter, which is utilized by most collaborative signature-based filters. URL-based filters examine URL frequency in incoming email and block bulk email when a predetermined threshold is passed. However, this can cause erroneous blocking of mass distribution of legitimate emails. Therefore, URL-based methods are limited in sufficient prevention of false positives, and finding solutions to eliminate this problem is critical for anti-spam systems. We present a complementary technique for URL-based filters, which uses the betweenness of web-page hostnames to prevent the erroneous blocking of legitimate hosts. The technique described was tested on a corpus of 10,000 random domains selected from the URIBL white and black list databases. We generated the appropriate linked network for each domain and calculated its centrality betweenness. We found that betweenness centrality of whitelist domains is significantly higher than that of blacklist domains. Results clearly show that the betweenness centrality metric can be a powerful and effective complementary tool for URL-based anti-spam systems. It can achieve a high level of accuracy in determining legitimate hostnames and thus significantly reduce false positives in these systems.