Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Revealing botnet membership using DNSBL counter-intelligence
SRUTI'06 Proceedings of the 2nd conference on Steps to Reducing Unwanted Traffic on the Internet - Volume 2
Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure
ICCIMA '07 Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) - Volume 02
SS'08 Proceedings of the 17th conference on Security symposium
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Feature analysis is an important task in the area of information extraction. Appropriate features give improved performance for any classification or clustering algorithm. In this paper we try to analyze different features that can be used to cluster spam emails at real time and thus improve IP blacklisting. Domain blacklisting becomes easy when these features are used because masses of IP address get grouped easily. We have explored several features in this paper like sender and subject of the email; email attachments, stylistic and semantic features. These features ensure appropriate clustering of spam originating from dominant hosts. We compute the effectiveness of these features in terms of how well they group emails, gather domain/IP information and thus improve domain blacklisting.