Using evidence based content trust model for spam detection

Authors:
Wei Wang;Guosun Zeng;Daizhong Tang
Affiliations:
Department of Computer Science and Engineering, Tongji University, Shanghai 200092, China and Tongji Branch National Engineering and Technology Center of High Performance, Shanghai 200092, China a ...;Department of Computer Science and Engineering, Tongji University, Shanghai 200092, China and Tongji Branch National Engineering and Technology Center of High Performance, Shanghai 200092, China a ...;School of Economics and Management, Tongji University, Shanghai 200092, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 17
Cited 2

A corpus analysis approach for automatic query expansion and its extension to multiple databases

ACM Transactions on Information Systems (TOIS)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Ontology-based web site mapping for information exploration

Proceedings of the eighth international conference on Information and knowledge management
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
The Eigentrust algorithm for reputation management in P2P networks

WWW '03 Proceedings of the 12th international conference on World Wide Web
Tree Induction for Probability-Based Ranking

Machine Learning
Propagation of trust and distrust

Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Topical TrustRank: using topicality to combat web spam

Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Towards content trust of web resources

Proceedings of the 15th international conference on World Wide Web
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Link spam detection based on mass estimation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A reference collection for web spam

ACM SIGIR Forum
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
A semantic reputation mechanism in p2p semantic web

ASWC'06 Proceedings of the First Asian conference on The Semantic Web

DC proposal: evaluating trustworthiness of web content using semantic web technologies

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Content-based analysis to detect Arabic web spam

Journal of Information Science

Quantified Score

Hi-index	12.05

Visualization

Abstract

Content trust is one of the main components in the research of information retrieval. As it gets easier to add information to the Web via HTML pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information on the Web. Current technology of spam detection is based on binary metric, that is binary classification is adapted in the spam detection. In order to meet the users' need and preference, more accurate metric is needed in the content trust as well as in detecting spam information. In this paper, we use the notion of content trust for spam detection, and regard it as a ranking problem. Besides traditional text feature attributes, information quality based evidence is introduced to define the trust feature of spam information, and a novel content trust learning algorithm based on these evidence is proposed. Finally, a Web spam detection system is developed and the experiments on the real Web data are carried out, which show the proposed method performs very well in practice.