A corpus analysis approach for automatic query expansion and its extension to multiple databases
ACM Transactions on Information Systems (TOIS)
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Ontology-based web site mapping for information exploration
Proceedings of the eighth international conference on Information and knowledge management
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
The Eigentrust algorithm for reputation management in P2P networks
WWW '03 Proceedings of the 12th international conference on World Wide Web
Tree Induction for Probability-Based Ranking
Machine Learning
Propagation of trust and distrust
Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Towards content trust of web resources
Proceedings of the 15th international conference on World Wide Web
Adapting ranking SVM to document retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A reference collection for web spam
ACM SIGIR Forum
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
A semantic reputation mechanism in p2p semantic web
ASWC'06 Proceedings of the First Asian conference on The Semantic Web
DC proposal: evaluating trustworthiness of web content using semantic web technologies
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Content-based analysis to detect Arabic web spam
Journal of Information Science
Hi-index | 12.05 |
Content trust is one of the main components in the research of information retrieval. As it gets easier to add information to the Web via HTML pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information on the Web. Current technology of spam detection is based on binary metric, that is binary classification is adapted in the spam detection. In order to meet the users' need and preference, more accurate metric is needed in the content trust as well as in detecting spam information. In this paper, we use the notion of content trust for spam detection, and regard it as a ranking problem. Besides traditional text feature attributes, information quality based evidence is introduced to define the trust feature of spam information, and a novel content trust learning algorithm based on these evidence is proposed. Finally, a Web spam detection system is developed and the experiments on the real Web data are carried out, which show the proposed method performs very well in practice.