Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Usage patterns of collaborative tagging systems
Journal of Information Science
Document clustering with prior knowledge
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Social tags: meaning and suggestions
Proceedings of the 17th ACM conference on Information and knowledge management
Analysis of Tags as a Social Network
CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 04
Contextualising tags in collaborative tagging systems
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Using Tag Co-occurrence for Recommendation
ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
Hi-index | 0.00 |
Breadcrumbs is a folksonomy of news clips, where users can aggregate fragments of text taken from online news. Besides the textual content, each news clip contains a set of metadata fields associated with it. User-defined tags are one of the most important of those information fields. Based on a small data set of news clips, we build a network of co-occurrence of tags in news clips, and use it to improve text clustering. We do this by defining a weighted cosine similarity proximity measure that takes into account both the clip vectors and the tag vectors. The tag weight is computed using the related tags that are present in the discovered community. We then use the resulting vectors together with the new distance metric, which allows us to identify socially biased document clusters. Our study indicates that using the structural features of the network of tags leads to a positive impact in the clustering process.