Elements of information theory
Elements of information theory
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Lucene in Action (In Action series)
Lucene in Action (In Action series)
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting nepotistic links by language model disagreement
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Transductive link spam detection
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam detection: new classification features based on qualified link analysis and language models
IEEE Transactions on Information Forensics and Security
Spam detection in online classified advertisements
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
Multi-facets quality assessment of online opinionated expressions
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Webspam demotion: Low complexity node aggregation methods
Neurocomputing
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Comment spam detection by sequence mining
Proceedings of the fifth ACM international conference on Web search and data mining
Spotting fake reviewer groups in consumer reviews
Proceedings of the 21st international conference on World Wide Web
Content-based analysis to detect Arabic web spam
Journal of Information Science
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Detecting malicious tweets in trending topics using a statistical analysis of language
Expert Systems with Applications: An International Journal
Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Automatically generated spam detection based on sentence-level topic information
Proceedings of the 22nd international conference on World Wide Web companion
Combating Web spam through trust-distrust propagation with confidence
Pattern Recognition Letters
Hi-index | 0.00 |
This paper applies a language model approach to different sources of information extracted from a Web page, in order to provide high quality indicators in the detection of Web Spam. Two pages linked by a hyperlink should be topically related, even though this were a weak contextual relation. For this reason we have analysed different sources of information of a Web page that belongs to the context of a link and we have applied Kullback-Leibler divergence on them for characterising the relationship between two linked pages. Moreover, we combine some of these sources of information in order to obtain richer language models. Given the different nature of internal and external links, in our study we also distinguished these types of links getting a significant improvement in classification tasks. The result is a system that improves the detection of Web Spam on two large and public datasets such as WEBSPAM-UK2006 and WEBSPAM-UK2007.