Topical TrustRank: using topicality to combat web spam

Authors:
Baoning Wu;Vinay Goel;Brian D. Davison
Affiliations:
Lehigh University, Bethlehem, PA;Lehigh University, Bethlehem, PA;Lehigh University, Bethlehem, PA
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 16
Cited 38

Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Analysis of a very large web search engine query log

ACM SIGIR Forum
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
WebBase: a repository of Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines

ACM SIGIR Forum
Propagation of trust and distrust

Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting phrase-level duplication on the world wide web

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Using ODP metadata to personalize search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Web projections: learning from contextual subgraphs of the web

Proceedings of the 16th international conference on World Wide Web
Review spam detection

Proceedings of the 16th international conference on World Wide Web
Extracting link spam using biased random walks from spam seed sets

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Combating spam in tagging systems

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Countering web spam with credibility-based link analysis

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
DirichletRank: Solving the zero-one gap problem of PageRank

ACM Transactions on Information Systems (TOIS)
Socialtrust: tamper-resilient trust establishment in online communities

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Combating spam in tagging systems: An evaluation

ACM Transactions on the Web (TWEB)
Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking

Algorithms and Models for the Web-Graph
Looking into the past to better classify web spam

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam filtering in internet archives

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting Link Hijacking by Web Spammers

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Link spam target detection using page farms

ACM Transactions on Knowledge Discovery from Data (TKDD)
The SocialTrust framework for trusted social information management: Architecture and algorithms

Information Sciences: an International Journal
Exploiting bidirectional links: making spamming detection easier

Proceedings of the 18th ACM conference on Information and knowledge management
Automatic seed set expansion for trust propagation based anti-spamming algorithms

Proceedings of the eleventh international workshop on Web information and data management
Web Spam Identification with User Browsing Graph

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Using evidence based content trust model for spam detection

Expert Systems with Applications: An International Journal
Predicting Web Page Status

Information Systems Research
Finding unusual review patterns using unexpected rules

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Web spam classification: a few features worth more

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Adversarial Web Search

Foundations and Trends in Information Retrieval
Using patterns in the behavior of the random surfer to detect webspam beneficiaries

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Webspam demotion: Low complexity node aggregation methods

Neurocomputing
Content-based trust and bias classification via biclustering

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Spotting fake reviewer groups in consumer reviews

Proceedings of the 21st international conference on World Wide Web
Preventing recommendation attack in trust-based recommender systems

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Detecting Webspam Beneficiaries Using Information Collected by the Random Surfer

International Journal of Organizational and Collective Intelligence
Automatic seed set expansion for trust propagation based anti-spam algorithms

Information Sciences: an International Journal
Simultaneously detecting fake reviews and review spammers using factor graph model

Proceedings of the 5th Annual ACM Web Science Conference
Spotting opinion spammers using behavioral footprints

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-lingual web spam classification

Proceedings of the 22nd international conference on World Wide Web companion
Predicting the social influence of upcoming contents in large social networks

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Combating Web spam through trust-distrust propagation with confidence

Pattern Recognition Letters
Leveraging Social Feedback to Verify Online Identity Claims

ACM Transactions on the Web (TWEB)
Ontology-based blog collection and profile-based personalised ranking

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the sense that the seed set used by TrustRank may not be sufficiently representative to cover well the different topics on the Web. Also, for a given seed set, TrustRank has a bias towards larger communities. We propose the use of topical information to partition the seed set and calculate trust scores for each topic separately to address the above issues. A combination of these trust scores for a page is used to determine its ranking. Experimental results on two large datasets show that our Topical TrustRank has a better performance than TrustRank in demoting spam sites or pages. Compared to TrustRank, our best technique can decrease spam from the top ranked sites by as much as 43.1%.