On clustering heterogeneous social media objects with outlier links

  • Authors:
  • Guo-Jun Qi;Charu C. Aggarwal;Thomas S. Huang

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA

  • Venue:
  • Proceedings of the fifth ACM international conference on Web search and data mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The clustering of social media objects provides intrinsic understanding of the similarity relationships between documents, images, and their contextual sources. Both content and link structure provide important cues for an effective clustering algorithm of the underlying objects. While link information provides useful hints for improving the clustering process, it also contains a significant amount of noisy information. Therefore, a robust clustering algorithm is required to reduce the impact of noisy links. In order to address the aforementioned problems, we propose heterogeneous random fields to model the structure and content of social media networks. We design a probability measure on the social media networks which output a configuration of clusters that are consistent with both content and link structure. Furthermore, noisy links can also be detected, and their impact on the clustering algorithm can be significantly reduced. We conduct experiments on a real social media network and show the advantage of the method over other state-of-the-art algorithms.