Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient URL caching for world wide web crawling
WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines
ACM SIGIR Forum
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
On the Evolution of Clusters of Near-Duplicate Web Pages
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Undue influence: eliminating the impact of link plagiarism on web search rankings
Proceedings of the 2006 ACM symposium on Applied computing
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Multi-level Link Structure Analysis Technqiue for Detecting Link Farm Spam Pages
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Web Dragons: Inside the Myths of Search Engine Technology
Web Dragons: Inside the Myths of Search Engine Technology
Web searching, search engines and Information Retrieval
Information Services and Use
Characterization of national Web domains
ACM Transactions on Internet Technology (TOIT)
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classifiers using link structure
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Using spam farm to boost PageRank
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Extracting link spam using biased random walks from spam seed sets
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
A taxonomy of JavaScript redirection spam
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam detection via commercial intent analysis
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Link analysis for Web spam detection
ACM Transactions on the Web (TWEB)
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Detecting splogs via temporal dynamics using self-similarity analysis
ACM Transactions on the Web (TWEB)
DirichletRank: Solving the zero-one gap problem of PageRank
ACM Transactions on Information Systems (TOIS)
Analyzing the impact of churn and malicious behavior on the quality of peer-to-peer web search
Proceedings of the 2008 ACM symposium on Applied computing
Improving web information indexing and retrieval based on center block duplication detection
International Journal of Innovative Computing and Applications
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying Spam Web Pages Based on Content Similarity
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
A large-scale study of automated web search traffic
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Identifying video spammers in online social networks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Robust PageRank and locally computable spam detection features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Predicting web spam with HTTP session information
Proceedings of the 17th ACM conference on Information and knowledge management
Improvements of HITS Algorithms for Spam Links
IEICE - Transactions on Information and Systems
Sitemaps: above and beyond the crawl of duty
Proceedings of the 18th international conference on World wide web
A study of link farm distribution and evolution using a time series of web snapshots
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam filtering in internet archives
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam identification through language model analysis
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Nullification test collections for web spam and SEO
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting Link Hijacking by Web Spammers
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Ranking billions of web pages using diodes
Communications of the ACM - A Blind Person's Interaction with Technology
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
A framework for describing web repositories
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Detecting spammers and content promoters in online video social networks
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Detecting spam blogs: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
CUCWeb: a Catalan corpus built from the web
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Foundations and Trends in Information Retrieval
Improvements of HITS algorithms for spam links
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Identifying spam link generators for monitoring emerging web spam
Proceedings of the 4th workshop on Information credibility
Local computation of PageRank contributions
WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Using evidence based content trust model for spam detection
Expert Systems with Applications: An International Journal
Connectivity of the Thai web graph
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
On the robustness of google scholar against spam
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Temporal query log profiling to improve web search ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spam detection with a content-based random-walk algorithm
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Automatic checking of alternative texts on web pages
ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I
Let web spammers expose themselves
Proceedings of the fourth ACM international conference on Web search and data mining
Removing web spam links from search engine results
Journal in Computer Virology
The dark side of the Internet: Attacks, costs and responses
Information Systems
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Filtering artificial texts with statistical machine learning techniques
Language Resources and Evaluation
Spam detection in online classified advertisements
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
Combining textual content and hyperlinks in web spam detection
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
deSEO: combating search-result poisoning
SEC'11 Proceedings of the 20th USENIX conference on Security
Sampling the national deep web
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
On the utility of incremental feature selection for the classification of textual data streams
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Content-based analysis to detect Arabic web spam
Journal of Information Science
Analysis and detection of web spam by means of web content
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Ranking document clusters using markov random fields
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
SAAD, a content based Web Spam Analyzer and Detector
Journal of Systems and Software
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Hi-index | 0.00 |
The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call "web spam", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is a nuisance to users as well as search engines: users have a harder time finding the information they need, and search engines have to cope with an inflated corpus, which in turn causes their cost per query to increase. Therefore, search engines have a strong incentive to weed out spam web pages from their index.We propose that some spam web pages can be identified through statistical analysis: Certain classes of spam pages, in particular those that are machine-generated, diverge in some of their properties from the properties of web pages at large. We have examined a variety of such properties, including linkage structure, page content, and page evolution, and have found that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.This paper describes the properties we have examined, gives the statistical distributions we have observed, and shows which kinds of outliers are highly correlated with web spam.