Using the overlapping community structure of a network of tags to improve text clustering

Authors:
Nuno Cravino;José Devezas;Álvaro Figueira
Affiliations:
Universidade do Porto, Porto, Portugal;Universidade do Porto, Porto, Portugal;Universidade do Porto, Porto, Portugal
Venue:
Proceedings of the 23rd ACM conference on Hypertext and social media
Year:
2012

Citing 9
Cited 0

Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Usage patterns of collaborative tagging systems

Journal of Information Science
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Social tags: meaning and suggestions

Proceedings of the 17th ACM conference on Information and knowledge management
Analysis of Tags as a Social Network

CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 04
Contextualising tags in collaborative tagging systems

Proceedings of the 20th ACM conference on Hypertext and hypermedia
Using Tag Co-occurrence for Recommendation

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process

ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Breadcrumbs is a folksonomy of news clips, where users can aggregate fragments of text taken from online news. Besides the textual content, each news clip contains a set of metadata fields associated with it. User-defined tags are one of the most important of those information fields. Based on a small data set of news clips, we build a network of co-occurrence of tags in news clips, and use it to improve text clustering. We do this by defining a weighted cosine similarity proximity measure that takes into account both the clip vectors and the tag vectors. The tag weight is computed using the related tags that are present in the discovered community. We then use the resulting vectors together with the new distance metric, which allows us to identify socially biased document clusters. Our study indicates that using the structural features of the network of tags leads to a positive impact in the clustering process.