Structure-aware topic clustering in social media

  • Authors:
  • Julien Dubuc;Sabine Bergler

  • Affiliations:
  • Concordia University, Montreal, PQ, Canada;Concordia University, Montreal, PQ, Canada

  • Venue:
  • Proceedings of the 10th ACM symposium on Document engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid evolution and growth of social media software has enabled hundreds of millions to interact within on-line communities on a global scale. While they enable communication through a common set of metaphors, such as discussion threads and quoting text in replies, this software uses a variety of diverging ways of representing discussion. Since the meaning of a conversation is defined not only by the content of a piece of text, but also by the relationships between pieces of text, part of the meaning of the discussion is obscured from automated processing. Search engines, which act as gateways to outsiders into the social text in a community, are reduced to giving an incomplete picture. This paper proposes a model for representing both the content and the structure of social text in a consistent way, enabling automated processing of the structure of the discussion along with its text content. It also describes a method for indexing text that uses this structural information to provide meaningful contexts for paragraphs of interest. It then describes a method for clustering text content into topic groups, using this indexing method, and also using the social structure to make informed decisions about which pieces of text to compare meaningfully.