Mining coherent anomaly collections on web data

Authors:
Hanbo Dai;Feida Zhu;Ee-Peng Lim;HweeHwa Pang
Affiliations:
Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 5
Cited 0

Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly pattern detection in categorical datasets

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Spotting fake reviewer groups in consumer reviews

Proceedings of the 21st international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection. In this paper, we propose the concept of Coherent Anomaly Collection (CAC) to capture this kind of collections, and put forward an efficient algorithm to simultaneously find the top-K disjoint CACs together with their anomalous behavior patterns. Compared with existing approaches, our new algorithm can find disjoint anomaly collections with coherent extreme behavior without having to specify either their number or sizes. Results on real Twitter data show that our approach discovers meaningful and informative hashtag spammer groups of various sizes which are hard to detect by clustering-based methods.