Detecting nepotistic links by language model disagreement

Authors:
András A. Benczúr;István Bíró;Károly Csalogány;Máté Uher
Affiliations:
Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest;Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest;Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest;Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 3
Cited 9

Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
Web page ranking using link attributes

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

A reference collection for web spam

ACM SIGIR Forum
Measuring similarity to detect qualified links

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam identification through language model analysis

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam detection: new classification features based on qualified link analysis and language models

IEEE Transactions on Information Forensics and Security
Spam detection in online classified advertisements

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Adversarial Web Search

Foundations and Trends in Information Retrieval
Updating broken web links: An automatic recommendation system

Information Processing and Management: an International Journal
Detecting malicious tweets in trending topics using a statistical analysis of language

Expert Systems with Applications: An International Journal
Combating Web spam through trust-distrust propagation with confidence

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the target page without the need of white and blacklists or human interaction. We fight various forms of nepotism such as common maintainers, ads, link exchanges or misused affiliate programs. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page random sample.