SimSpectrum: a similarity based spectral clustering approach to generate a tag cloud

  • Authors:
  • Frederico Durao;Peter Dolog;Martin Leginus;Ricardo Lage

  • Affiliations:
  • IWIS -- Intelligent Web and Information Systems, Computer Science Department, Aalborg University, Aalborg-East, Denmark;IWIS -- Intelligent Web and Information Systems, Computer Science Department, Aalborg University, Aalborg-East, Denmark;IWIS -- Intelligent Web and Information Systems, Computer Science Department, Aalborg University, Aalborg-East, Denmark;IWIS -- Intelligent Web and Information Systems, Computer Science Department, Aalborg University, Aalborg-East, Denmark

  • Venue:
  • ICWE'11 Proceedings of the 11th international conference on Current Trends in Web Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tag clouds are means for navigation and exploration of information resources on the web provided by social Web sites. The most used approach to generate a tag cloud so far is based on popularity of tags among users who annotate by those tags. This approach however has several limitations, such as suppressing number of tags which are not used often but could lead to interesting resources as well as tags which have been suppressed due to the default number of tags to present in the tag cloud. In this paper we propose the SimSpectrum: a similarity based spectral clustering approach to generate a tag cloud which improves the current state of the art with respect to these limitations. Our approach is based on finding to which extent the tags are related by a similarity calculus. Based on the results from similarity calculation, the spectral clustering algorithm finds the clusters of tags which are strongly related and are loosely related to the other tags. By doing so, we can cover part of the tags which are discarded by traditional tag cloud generation approaches and therefore, present the user with more opportunities to find related interesting web resources. We also show that in terms of the metrics that capture the structural properties of a tag cloud such as coverage and relevance our method has significant results compared to the baseline tag cloud that relies on tag popularity. In terms of the overlap measure, our method shows improvements against the baseline approach. The proposed approach is evaluated using MedWorm medical article collection.