Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking

Authors:
Tony Abou-Assaleh;Tapajyoti Das
Affiliations:
GenieKnows.com Email: research@genieknows.com, Halifax, Canada;GenieKnows.com Email: research@genieknows.com, Halifax, Canada
Venue:
Algorithms and Models for the Web-Graph
Year:
2007

Citing 6
Cited 0

Link spam alliances

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Topical TrustRank: using topicality to combat web spam

Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Google's PageRank and Beyond: The Science of Search Engine Rankings

Google's PageRank and Beyond: The Science of Search Engine Rankings
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users typically locate useful Web pages by querying a search engine. However, today's search engines are seriously threatened by malicious spam pages that attempt to subvert the unbiased searching and ranking services provided by the engines. Given the large fraction of Web traffic originating from search engine referrals and the high potential monetary value of this traffic, it is not surprising that some Web site owners try to influence the ranking function of a search engine in a malicious way, thus giving rise to Web spam. Since the algorithmic identification of spam is very difficult, most techniques require either some human assistance or extensive training to effectively deal with spam. We exploit the possibility of automatically reducing Web spam page in a Web collection by analyzing the Web graph, coupled with very simple content analysis. We present empirical evaluation of our approach on 1 million Web pages from the health domain. Our results clearly indicate that we can effectively filter out a significant fraction of Web spam pages.