Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Analysis of a very large web search engine query log
ACM SIGIR Forum
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Finding authorities and hubs from link structures on the World Wide Web
Proceedings of the 10th international conference on World Wide Web
SALSA: the stochastic approach for link-structure analysis
ACM Transactions on Information Systems (TOIS)
Proceedings of the 11th international conference on World Wide Web
Improvement of HITS-based algorithms on web documents
Proceedings of the 11th international conference on World Wide Web
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling personalized web search
WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines
ACM SIGIR Forum
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Proceedings of the 13th international conference on World Wide Web
Propagation of trust and distrust
Proceedings of the 13th international conference on World Wide Web
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
ACM Transactions on Internet Technology (TOIT)
Analysis and improvement of HITS algorithm for detecting Web communities
Systems and Computers in Japan
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Undue influence: eliminating the impact of link plagiarism on web search rankings
Proceedings of the 2006 ACM symposium on Applied computing
Generalizing PageRank: damping functions for link-based ranking algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A reference collection for web spam
ACM SIGIR Forum
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Anchor-based proximity measures
Proceedings of the 16th international conference on World Wide Web
Splog detection using self-similarity analysis on blog temporal dynamics
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classifiers using link structure
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Transductive link spam detection
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
A taxonomy of JavaScript redirection spam
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam detection via commercial intent analysis
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Countering web spam with credibility-based link analysis
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges
IEEE Internet Computing
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
User behavior oriented web spam detection
Proceedings of the 17th international conference on World Wide Web
Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
BrowseRank: letting web users vote for page importance
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Are click-through data adequate for learning web search rankings?
Proceedings of the 17th ACM conference on Information and knowledge management
Predicting web spam with HTTP session information
Proceedings of the 17th ACM conference on Information and knowledge management
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
Proceedings of the 17th ACM conference on Information and knowledge management
Detection of cloaked web spam by using tag-based methods
Expert Systems with Applications: An International Journal
Statistical Language Models for Information Retrieval
Statistical Language Models for Information Retrieval
Link based small sample learning for web spam detection
Proceedings of the 18th international conference on World wide web
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Detecting spam blogs: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review
Link analysis, eigenvectors and stability
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
An effective method for combating malicious scripts clickbots
ESORICS'09 Proceedings of the 14th European conference on Research in computer security
Graph regularization methods for Web spam detection
Machine Learning
Let web spammers expose themselves
Proceedings of the fourth ACM international conference on Web search and data mining
Web spam classification: a few features worth more
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
On the evolution of clusters of near-duplicate web pages
Journal of Web Engineering
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Content-based analysis to detect Arabic web spam
Journal of Information Science
Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Spotting opinion spammers using behavioral footprints
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Russian web spam evolution: yandex experience
Proceedings of the 22nd international conference on World Wide Web companion
Automatically generated spam detection based on sentence-level topic information
Proceedings of the 22nd international conference on World Wide Web companion
Ranking fraud detection for mobile apps: a holistic view
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phenomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam detection techniques with the focus on algorithms and underlying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization, and featurebased. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.