Multi-source deep learning for information trustworthiness estimation

Authors:
Liang Ge;Jing Gao;Xiaoyi Li;Aidong Zhang
Affiliations:
The State University of New York at Buffalo, Buffalo, New York, USA;The State University of New York at Buffalo, Buffalo, New York, USA;The State University of New York at Buffalo, Buffalo, New York, USA;The State University of New York at Buffalo, Buffalo, New York, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 17
Cited 0

Training products of experts by minimizing contrastive divergence

Neural Computation
Shilling recommender systems for fun and profit

Proceedings of the 13th international conference on World Wide Web
Multi-View Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A fast learning algorithm for deep belief nets

Neural Computation
Netprobe: a fast and scalable system for fraud detection in online auction networks

Proceedings of the 16th international conference on World Wide Web
Restricted Boltzmann machines for collaborative filtering

Proceedings of the 24th international conference on Machine learning
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Truth Discovery with Multiple Conflicting Information Providers on the Web

IEEE Transactions on Knowledge and Data Engineering
Unsupervised shilling detection for collaborative filtering

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
A survey of collaborative filtering techniques

Advances in Artificial Intelligence
Uncovering social spammers: social honeypots + machine learning

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Detecting product review spammers using rating behaviors

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Restricted deep belief networks for multi-view learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Making better informed trust decisions with generalized fact-finding

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Truth finding on the deep web: is the problem solved?

Proceedings of the VLDB Endowment
Estimating Local Information Trustworthiness via Multi-source Joint Matrix Factorization

ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, information trustworthiness has become a serious issue when user-generated contents prevail in our information world. In this paper, we investigate the important problem of estimating information trustworthiness from the perspective of correlating and comparing multiple data sources. To a certain extent, the consistency degree is an indicator of information reliability--Information unanimously agreed by all the sources is more likely to be reliable. Based on this principle, we develop an effective computational approach to identify consistent information from multiple data sources. Particularly, we analyze vast amounts of information collected from multiple review platforms (multiple sources) in which people can rate and review the items they have purchased. The major challenge is that different platforms attract diverse sets of users, and thus information cannot be compared directly at the surface. However, latent reasons hidden in user ratings are mostly shared by multiple sources, and thus inconsistency about an item only appears when some source provides ratings deviating from the common latent reasons. Therefore, we propose a novel two-step procedure to calculate information consistency degrees for a set of items which are rated by multiple sets of users on different platforms. We first build a Multi-Source Deep Belief Network (MSDBN) to identify the common reasons hidden in multi-source rating data, and then calculate a consistency score for each item by comparing individual sources with the reconstructed data derived from the latent reasons. We conduct experiments on real user ratings collected from Orbitz, Priceline and TripAdvisor on all the hotels in Las Vegas and New York City. Experimental results demonstrate that the proposed approach successfully finds the hotels that receive inconsistent, and possibly unreliable, ratings.