Hierarchical comments-based clustering

  • Authors:
  • Chiao-Fang Hsu;James Caverlee;Elham Khabiri

  • Affiliations:
  • Texas A&M University, College Station, TX;Texas A&M University, College Station, TX;Texas A&M University, College Station, TX

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Applied Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information resources on the Web like videos, images, and documents are increasingly becoming more "social" through user engagement via commenting systems. These commenting systems provide a forum for users to discuss the resources but have the side effect of providing valuable editorial and contextual information about the resources. In this paper, we explore a comments-driven clustering framework for organizing Web resources according to this user-based perspective. Concretely, we propose a hierarchical comment clustering approach that relies on two key features: (i) comment term normalization and key term extraction for distilling noisy comments for effective clustering; and (ii) a real-time insertion component for incrementally updating the comments-based hierarchy so that resources can be efficiently placed in the hierarchy as comments arise and without the need to re-generate the (potentially) expensive hierarchy. We study the clustering approach over the popular video sharing site YouTube. YouTube is a challenging and difficult environment, notorious for its extremely short, ill-formed, and often unintelligible user-contributed comments. Through extensive experimental study, we find that the proposed approach can lead to effective and efficient comments-based video organizing even in a YouTube-like environment.