Search engine click spam detection based on bipartite graph propagation

Authors:
Xin Li;Min Zhang;Yiqun Liu;Shaoping Ma;Yijiang Jin;Liyun Ru
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
Proceedings of the 7th ACM international conference on Web search and data mining
Year:
2014

Citing 20
Cited 0

The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Using association rules for fraud detection in web advertising networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Wide-scale botnet detection and characterization

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Is a bot at the controls?: Detecting input data attacks

Proceedings of the 6th ACM SIGCOMM workshop on Network and system support for games
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Characterizing typical and atypical user sessions in clickstreams

Proceedings of the 17th international conference on World Wide Web
Click Fraud

Computer
Identifying web spam with user behavior analysis

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
BotMiner: clustering analysis of network traffic for protocol- and structure-independent botnet detection

SS'08 Proceedings of the 17th conference on Security symposium
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Click chain model in web search

Proceedings of the 18th international conference on World wide web
Large-scale bot detection for search engines

Proceedings of the 19th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using search engines to retrieve information has become an important part of people's daily lives. For most search engines, click information is an important factor in document ranking. As a result, some websites cheat to obtain a higher rank by fraudulently increasing clicks to their pages, which is referred to as "Click Spam". Based on an analysis of the features of fraudulent clicks, a novel automatic click spam detection approach is proposed in this paper, which consists of 1. modeling user sessions with a triple sequence, which, to the best of our knowledge, takes into account not only the user action but also the action objective and the time interval between actions for the first time; 2. using the user-session bipartite graph propagation algorithm to take advantage of cheating users to find more cheating sessions; and 3. using the pattern-session bipartite graph propagation algorithm to obtain cheating session patterns to achieve higher precision and recall of click spam detection. Experimental results based on a Chinese commercial search engine using real-world log data containing approximately 80 million user clicks per day show that 2.6% of all clicks were detected as spam with a precision of up to 97%.