Automatic Tag Recommendation for Weblogs

  • Authors:
  • Yicen Liu;Mingrong Liu;Xing Chen;Liang Xiang;Qing Yang

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ITCS '09 Proceedings of the 2009 International Conference on Information Technology and Computer Science - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

There have been many researches on how to recommend tags for weblogs. In this paper, we propose a novel automatic tag recommendation algorithm, which can be used in the large-scale and real-time data process effectively and efficiently. Most existing researches on tag suggestion focus on firstly mining the relationship between testing and training data and then assigning the top ranked tags of the most related training data to the testing object. However, they ignore the internal relationship between tags and weblogs. According to our research, more than 43% tags, which have been labeled by weblog users, have actually been used in the body of the text. At the meanwhile, the term frequency distribution, the paragraph frequency distribution and the first occurrence position of tags are very different from the ones of non-tags in the text. In this paper, the tags of a weblog are assigned in two steps. First of all, some probability distributions of theword attributes are trained by the labeled training weblogs, and some keywords of a testing weblog are extracted as one part of the tags based on the probability distributions. Then the other part of the tags are obtained from the first part ones with the help of Latent Semantic Indexing (LSI) model. Experiments on a large-scale tagging dataset of weblogs 12 show that the average tagging time for a new weblog is less than 0.02 seconds, and over 74% testing weblogs are correctly labeled with the top 15 tags.