Exploit semantic information for category annotation recommendation in wikipedia

  • Authors:
  • Yang Wang;Haofen Wang;Haiping Zhu;Yong Yu

  • Affiliations:
  • Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, P. R. China;Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, P. R. China;Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, P. R. China;Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, P. R. China

  • Venue:
  • NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compared with plain-text resources, the ones in "semi-semantic" web sites, such asWikipedia, contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper, we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach, four typical semantic features in Wikipedia, namely incoming link, outgoing link, section heading and template item, are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating, with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles.