Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Internet Technology (TOIT)
Modern Information Retrieval
Challenges in web search engines
ACM SIGIR Forum
Proceedings of the 13th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Linear prediction models with graph regularization for web-page categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A reference collection for web spam
ACM SIGIR Forum
Detecting Link Spam Using Temporal Information
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Link analysis for Web spam detection
ACM Transactions on the Web (TWEB)
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying Spam Web Pages Based on Content Similarity
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Query-log mining for detecting spam
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Identifying video spammers in online social networks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Predicting web spam with HTTP session information
Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Link based small sample learning for web spam detection
Proceedings of the 18th international conference on World wide web
Web spam filtering in internet archives
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam identification through language model analysis
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Detecting spammers and content promoters in online video social networks
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Simulated Iterative Classification A New Learning Procedure for Graph Labeling
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Detectando usuários maliciosos em interações via vídeos no YouTube
Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
Detecting Wikipedia vandalism with active learning and statistical language models
Proceedings of the 4th workshop on Information credibility
Identifying spam link generators for monitoring emerging web spam
Proceedings of the 4th workshop on Information credibility
Fighting webspam: detecting spam on the graph via content and link features
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
On the robustness of google scholar against spam
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Scholarly paper recommendation via user's recent research interests
Proceedings of the 10th annual joint conference on Digital libraries
Semi-supervised spam filtering using aggressive consistency learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Web spam detection: new classification features based on qualified link analysis and language models
IEEE Transactions on Information Forensics and Security
Temporal query log profiling to improve web search ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spam detection with a content-based random-walk algorithm
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Detecting spam bots in online social networking sites: a machine learning approach
DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
Web spam detection by probability mapping graphSOMs and graph neural networks
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
A combined topical/non-topical approach to identifying web sites for children
Proceedings of the fourth ACM international conference on Web search and data mining
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Web spam classification: a few features worth more
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Spam detection in online classified advertisements
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
The nuts and bolts of a forum spam automator
LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats
Detecting malicious web links and identifying their attack types
WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
Statistical feature extraction for cross-language web content quality assessment
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Social network analysis of web links to eliminate false positives in collaborative anti-spam systems
Journal of Network and Computer Applications
Predicting friendship links in social networks using a topic modeling approach
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
deSEO: combating search-result poisoning
SEC'11 Proceedings of the 20th USENIX conference on Security
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Content-based trust and bias classification via biclustering
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Understanding and combating link farming in the twitter social network
Proceedings of the 21st international conference on World Wide Web
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Content-based analysis to detect Arabic web spam
Journal of Information Science
Efficient classifiers for multi-class classification problems
Decision Support Systems
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Management Information Systems (TMIS)
Statistical cross-language Web content quality assessment
Knowledge-Based Systems
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Detecting malicious tweets in trending topics using a statistical analysis of language
Expert Systems with Applications: An International Journal
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Community-based features for identifying spammers in online social networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. In this paper we present a spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages. We find that linked hosts tend to belong to the same class: either both are spam or both are non-spam. We demonstrate three methods of incorporating the Web graph topology into the predictions obtained by our base classifier: (i) clustering the host graph, and assigning the label of all hosts in the cluster by majority vote, (ii) propagating the predicted labels to neighboring hosts, and (iii) using the predicted labels of neighboring hosts as new features and retraining the classifier. The result is an accurate system for detecting Web spam, tested on a large and public dataset, using algorithms that can be applied in practice to large-scale Web data.