Content-based trust and bias classification via biclustering

Authors:
Dávid Siklósi;Bálint Daróczy;András A. Benczúr
Affiliations:
Institute for Computer Science and Control, Hungarian Academy of Sciences;Institute for Computer Science and Control, Hungarian Academy of Sciences;Institute for Computer Science and Control, Hungarian Academy of Sciences
Venue:
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Year:
2012

Citing 16
Cited 1

Support-Vector Networks

Machine Learning
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines

ACM SIGIR Forum
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction

GI '05 Proceedings of Graphics Interface 2005
Spam: It's Not Just for Inboxes Anymore

Computer
Topical TrustRank: using topicality to combat web spam

Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam

ACM SIGIR Forum
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

AIRWeb '09, 5th International Workshop on Adversarial Information Retrieval on the Web
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Web spam classification: a few features worth more

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Adapted vocabularies for generic visual categorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Cross-lingual web spam classification

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we improve trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification NDCG results by over 3--10% for neutrality, bias and trustworthiness. The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC only slightly above 0.5 indicates the hardness of the task.