Identifying web spam with user behavior analysis

Authors:
Yiqun Liu;Rongwei Cen;Min Zhang;Shaoping Ma;Liyun Ru
Affiliations:
Tsinghua University, Beijing, China P.R.;Tsinghua University, Beijing, China P.R.;Tsinghua University, Beijing, China P.R.;Tsinghua University, Beijing, China P.R.;Tsinghua University, Beijing, China P.R.
Venue:
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Year:
2008

Citing 15
Cited 13

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Analysis of a very large web search engine query log

ACM SIGIR Forum
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
The connectivity sonar: detecting site functionality by structural patterns

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
One-class svms for document classification

The Journal of Machine Learning Research
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Spam double-funnel: connecting web spammers with advertisers

Proceedings of the 16th international conference on World Wide Web
Improving web spam classification using rank-time features

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Challenges in web search engines

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Web Spam Identification with User Browsing Graph

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Learning to detect web spam by genetic programming

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Detecting spam blogs from blog search results

Information Processing and Management: an International Journal
Adversarial Web Search

Foundations and Trends in Information Retrieval
Incorporating web browsing activities into anchor texts for web search

Information Retrieval
Relative effect of spam and irrelevant documents on user interaction with search engines

Proceedings of the 20th ACM international conference on Information and knowledge management
Identifying Web Spam with the Wisdom of the Crowds

ACM Transactions on the Web (TWEB)
Behaviour-Based web spambot detection by utilising action time and action frequency

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Fighting against web spam: a novel propagation method based on click-through data

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Spam 2.0

Proceedings of the CUBE International Information Technology Conference
Specification and validation of enterprise information security policies

Proceedings of the CUBE International Information Technology Conference
Using site-level connections to estimate link confidence

Journal of the American Society for Information Science and Technology
Search engine click spam detection based on bipartite graph propagation

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spam and are incapable and inefficient for newly-appeared spam. With user behavior analyses into Web access logs, we propose a spam page detection algorithm based on Bayesian Learning. The main contributions of our work are: (1) User visiting patterns of spam pages are studied and three user behavior features are proposed to separate Web spam from ordinary ones. (2) A novel spam detection framework is proposed that can detect unknown spam types and newly-appeared spam with the help of user behavior analysis. Preliminary experiments on large scale Web access log data (containing over 2.74 billion user clicks) show the effectiveness of the proposed features and detection framework.