C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Challenges in web search engines
ACM SIGIR Forum
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
A reference collection for web spam
ACM SIGIR Forum
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
Splog detection using self-similarity analysis on blog temporal dynamics
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classifiers using link structure
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Transductive link spam detection
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
A taxonomy of JavaScript redirection spam
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam detection via commercial intent analysis
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring conference quality by mining program committee characteristics
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
DiffusionRank: a possible penicillin for web spamming
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Geographic ranking for a local search engine
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Authors vs. readers: a comparative study of document metadata and content in the www
Proceedings of the 2007 ACM symposium on Document engineering
Using word similarity to eradicate junk emails
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Link analysis for Web spam detection
ACM Transactions on the Web (TWEB)
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Detecting splogs via temporal dynamics using self-similarity analysis
ACM Transactions on the Web (TWEB)
Disorder inequality: a combinatorial approach to nearest neighbor search
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
DirichletRank: Solving the zero-one gap problem of PageRank
ACM Transactions on Information Systems (TOIS)
Exploring social annotations for web document classification
Proceedings of the 2008 ACM symposium on Applied computing
User behavior oriented web spam detection
Proceedings of the 17th international conference on World Wide Web
Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
Blogosphere: research issues, tools, and applications
ACM SIGKDD Explorations Newsletter
Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking
Algorithms and Models for the Web-Graph
Identifying Spam Web Pages Based on Content Similarity
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Query-log mining for detecting spam
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Cleaning search results using term distance features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Robust PageRank and locally computable spam detection features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Spam characterization and detection in peer-to-peer file-sharing systems
Proceedings of the 17th ACM conference on Information and knowledge management
Predicting web spam with HTTP session information
Proceedings of the 17th ACM conference on Information and knowledge management
Real-time data pre-processing technique for efficient feature extraction in large scale datasets
Proceedings of the 17th ACM conference on Information and knowledge management
Cost-effective spam detection in p2p file-sharing systems
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Oracle, where shall I submit my papers?
Communications of the ACM - Inspiring Women in Computing
Quality Information Retrieval for the World Wide Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improvements of HITS Algorithms for Spam Links
IEICE - Transactions on Information and Systems
Fast dynamic reranking in large graphs
Proceedings of the 18th international conference on World wide web
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam filtering in internet archives
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam identification through language model analysis
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam challenge proposal for filtering in archives
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting Link Hijacking by Web Spammers
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
Combinatorial Framework for Similarity Search
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Proceedings of the 18th ACM conference on Information and knowledge management
TrackBack spam: abuse and prevention
Proceedings of the 2009 ACM workshop on Cloud computing security
Web Spam Identification with User Browsing Graph
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Comment spam injection made easy
CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference
A brief survey of computational approaches in social computing
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Ads-portal domains: Identification and measurements
ACM Transactions on the Web (TWEB)
Foundations and Trends in Information Retrieval
Fighting link spam with a two-stage ranking strategy
ECIR'07 Proceedings of the 29th European conference on IR research
Improvements of HITS algorithms for spam links
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Local computation of PageRank contributions
WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Improving spamdexing detection via a two-stage classification strategy
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Using evidence based content trust model for spam detection
Expert Systems with Applications: An International Journal
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Web spam detection: new classification features based on qualified link analysis and language models
IEEE Transactions on Information Forensics and Security
Temporal query log profiling to improve web search ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Finding unusual review patterns using unexpected rules
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spam detection with a content-based random-walk algorithm
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Learning to detect web spam by genetic programming
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Design principles for developing stream processing applications
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Identifying and resolving hidden text salting
IEEE Transactions on Information Forensics and Security
Detecting comment spam through content analysis
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Quality-biased ranking of web documents
Proceedings of the fourth ACM international conference on Web search and data mining
Let web spammers expose themselves
Proceedings of the fourth ACM international conference on Web search and data mining
Removing web spam links from search engine results
Journal in Computer Virology
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Filtering artificial texts with statistical machine learning techniques
Language Resources and Evaluation
Web spam classification: a few features worth more
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Spam detection in online classified advertisements
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Classifying with co-stems: a new representation for information filtering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Detecting malicious web links and identifying their attack types
WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
Finding deceptive opinion spam by any stretch of the imagination
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Cross-lingual text categorization: Conquering language boundaries in globalized environments
Information Processing and Management: an International Journal
Measuring and analyzing search-redirection attacks in the illicit online prescription drug trade
SEC'11 Proceedings of the 20th USENIX conference on Security
deSEO: combating search-result poisoning
SEC'11 Proceedings of the 20th USENIX conference on Security
Spam detection using web page content: a new battleground
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Link spamming Wikipedia for profit
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Tackling content spamming with a term weighting scheme
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
An exploratory analysis of mind maps
Proceedings of the 11th ACM symposium on Document engineering
Autonomous link spam detection in purely collaborative environments
Proceedings of the 7th International Symposium on Wikis and Open Collaboration
Reclaiming the blogosphere, talkback: a secure linkback protocol for weblogs
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Using patterns in the behavior of the random surfer to detect webspam beneficiaries
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
SURF: detecting and measuring search poisoning
Proceedings of the 18th ACM conference on Computer and communications security
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Webspam demotion: Low complexity node aggregation methods
Neurocomputing
Automatic Moderation of Online Discussion Sites
International Journal of Electronic Commerce
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Index ordering by query-independent measures
Information Processing and Management: an International Journal
Spam filtering in twitter using sender-receiver relationship
RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
Spotting fake reviewer groups in consumer reviews
Proceedings of the 21st international conference on World Wide Web
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Evaluating Arabic spam classifiers using link analysis
Proceedings of the 3rd International Conference on Information and Communication Systems
Content-based analysis to detect Arabic web spam
Journal of Information Science
Mining user dwell time for personalized web search re-ranking
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting social spam campaigns on twitter
ACNS'12 Proceedings of the 10th international conference on Applied Cryptography and Network Security
ACM Transactions on Management Information Systems (TMIS)
Analysis and detection of web spam by means of web content
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Observing facial expressions and gaze positions for personalized webpage recommendation
Proceedings of the 12th International Conference on Electronic Commerce: Roadmap for the Future of Electronic Business
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
International Journal of Knowledge and Web Intelligence
NCDawareRank: a novel ranking method that exploits the decomposable structure of the web
Proceedings of the sixth ACM international conference on Web search and data mining
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis
International Journal of Information Security and Privacy
Detecting Webspam Beneficiaries Using Information Collected by the Random Surfer
International Journal of Organizational and Collective Intelligence
Automatic seed set expansion for trust propagation based anti-spam algorithms
Information Sciences: an International Journal
Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Term level search result diversification
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Cost-sensitive online active learning with application to malicious URL detection
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Automatically generated spam detection based on sentence-level topic information
Proceedings of the 22nd international conference on World Wide Web companion
Ranking fraud detection for mobile apps: a holistic view
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Combating Web spam through trust-distrust propagation with confidence
Pattern Recognition Letters
Shady paths: leveraging surfing crowds to detect malicious web pages
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
SAAD, a content based Web Spam Analyzer and Detector
Journal of Systems and Software
Cross-modal social image clustering and tag cleansing
Journal of Visual Communication and Image Representation
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Web Intelligence and Agent Systems
Hi-index | 0.00 |
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%).