Estimating Local Information Trustworthiness via Multi-source Joint Matrix Factorization

  • Authors:
  • Liang Ge;Jing Gao;Xiao Yu;Wei Fan;Aidong Zhang

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate how to estimate information trustworthiness by considering multiple information sources jointly in a latent matrix space. We particularly focus on user review and recommendation systems, as there are multiple platforms where people can rate items and services that they have purchased, and many potential customers rely on these opinions to make decisions. Information trustworthiness is a serious problem because ratings are generated freely by end-users so that many stammers take advantage of freedom of speech to promote their business or damage reputation of competitors. We propose to simply use customer ratings to estimate each individual source's reliability by exploring correlations among multiple sources. Ratings of items are provided by users of diverse tastes and styles, and thus may appear noisy and conflicting across sources, however, they share some underlying common behavior. Therefore, we can group users based on their opinions, and a source is reliable on an item if its opinions given by latent groups are consistent across platforms. Inspired by this observation, we solve the problem by a two-step model -- a joint matrix factorization procedure followed by reliability score computation. We propose two effective approaches to decompose rating matrices as the products of group membership and group rating matrices, and then compute consistency degrees from group rating matrices as source reliability scores. We conduct experiments on both synthetic data and real user ratings collected from Orbitz, Priceline and Trip Advisor on all the hotels in Las Vegas and New York City. Results show that the proposed method is able to give accurate estimates of source reliability and thus successfully identify inconsistent, conflicting and unreliable information.