The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Analysis of a very large web search engine query log
ACM SIGIR Forum
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
One-class svms for document classification
The Journal of Machine Learning Research
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Challenges in web search engines
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Web Spam Identification with User Browsing Graph
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Learning to detect web spam by genetic programming
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Incorporating web browsing activities into anchor texts for web search
Information Retrieval
Relative effect of spam and irrelevant documents on user interaction with search engines
Proceedings of the 20th ACM international conference on Information and knowledge management
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Behaviour-Based web spambot detection by utilising action time and action frequency
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the CUBE International Information Technology Conference
Specification and validation of enterprise information security policies
Proceedings of the CUBE International Information Technology Conference
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Search engine click spam detection based on bipartite graph propagation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spam and are incapable and inefficient for newly-appeared spam. With user behavior analyses into Web access logs, we propose a spam page detection algorithm based on Bayesian Learning. The main contributions of our work are: (1) User visiting patterns of spam pages are studied and three user behavior features are proposed to separate Web spam from ordinary ones. (2) A novel spam detection framework is proposed that can detect unknown spam types and newly-appeared spam with the help of user behavior analysis. Preliminary experiments on large scale Web access log data (containing over 2.74 billion user clicks) show the effectiveness of the proposed features and detection framework.