Using the overlapping community structure of a network of tags to improve text clustering

  • Authors:
  • Nuno Cravino;José Devezas;Álvaro Figueira

  • Affiliations:
  • Universidade do Porto, Porto, Portugal;Universidade do Porto, Porto, Portugal;Universidade do Porto, Porto, Portugal

  • Venue:
  • Proceedings of the 23rd ACM conference on Hypertext and social media
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Breadcrumbs is a folksonomy of news clips, where users can aggregate fragments of text taken from online news. Besides the textual content, each news clip contains a set of metadata fields associated with it. User-defined tags are one of the most important of those information fields. Based on a small data set of news clips, we build a network of co-occurrence of tags in news clips, and use it to improve text clustering. We do this by defining a weighted cosine similarity proximity measure that takes into account both the clip vectors and the tag vectors. The tag weight is computed using the related tags that are present in the discovered community. We then use the resulting vectors together with the new distance metric, which allows us to identify socially biased document clusters. Our study indicates that using the structural features of the network of tags leads to a positive impact in the clustering process.