On clustering heterogeneous social media objects with outlier links

Authors:
Guo-Jun Qi;Charu C. Aggarwal;Thomas S. Huang
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 18
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Almost-constant-time clustering of arbitrary corpus subsets4

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Image and Feature Co-Clustering

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Hierarchical clustering of WWW image search results using visual, textual and link information

Proceedings of the 12th annual ACM international conference on Multimedia
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Web image clustering by consistent utilization of visual features and surrounding texts

Proceedings of the 13th annual ACM international conference on Multimedia
Evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining content and link for classification using matrix factorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evolutionary spectral clustering by incorporating temporal smoothness

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering

Proceedings of the 17th international conference on World Wide Web
Extracting community structure through relational hypergraphs

Proceedings of the 18th international conference on World wide web
Scalable graph clustering using stochastic flows: applications to community discovery

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link and content for community detection: a discriminative approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment
CLUE: cluster-based retrieval of images by unsupervised learning

IEEE Transactions on Image Processing
Unsupervised image-set clustering using an information theoretic framework

IEEE Transactions on Image Processing

Multi-modal distance metric learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The clustering of social media objects provides intrinsic understanding of the similarity relationships between documents, images, and their contextual sources. Both content and link structure provide important cues for an effective clustering algorithm of the underlying objects. While link information provides useful hints for improving the clustering process, it also contains a significant amount of noisy information. Therefore, a robust clustering algorithm is required to reduce the impact of noisy links. In order to address the aforementioned problems, we propose heterogeneous random fields to model the structure and content of social media networks. We design a probability measure on the social media networks which output a configuration of clusters that are consistent with both content and link structure. Furthermore, noisy links can also be detected, and their impact on the clustering algorithm can be significantly reduced. We conduct experiments on a real social media network and show the advantage of the method over other state-of-the-art algorithms.