FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Correlation Clustering: maximizing agreements via semidefinite programming
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Collision Module Integration in a Specific Graphic Engine for Terrain Visualization
IV '04 Proceedings of the Information Visualisation, Eighth International Conference
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
Supervised clustering with support vector machines
ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with qualitative information
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Learning the distance metric in a personal ontology
Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Bayesian clustering for email campaign detection
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Probabilistic structured predictors
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Max-Margin Early Event Detectors
International Journal of Computer Vision
Hi-index | 0.00 |
We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made --- owing to the streaming nature of the data --- then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.