Wiki3C: exploiting wikipedia for context-aware concept categorization

  • Authors:
  • Peng Jiang;Huiman Hou;Lijiang Chen;Shimin Chen;Conglei Yao;Chengkai Li;Min Wang

  • Affiliations:
  • HP Labs China, Beijing, China;Baidu, Beijing, China;HP Labs China, Beijing, China;HP Labs China, Beijing, China;Tencent, Beijing, China;University of Texas at Arlington, Arlington, USA;HP Labs China, Beijing, China

  • Venue:
  • Proceedings of the sixth ACM international conference on Web search and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikipedia is an important human generated knowledge base containing over 21 million articles organized by millions of categories. In this paper, we exploit Wikipedia for a new task of text mining: Context-aware Concept Categorization. In the task, we focus on categorizing concepts according to their context. We exploit article link feature and category structure in Wikipedia, followed by introducing Wiki3C, an unsupervised and domain independent concept categorization approach based on context. In the approach, we investigate two strategies to select and filter Wikipedia articles for the category representation. Besides, a probabilistic model is employed to compute the semantic relatedness between two concepts in Wikipedia. Experimental evaluation using manually labeled ground truth shows that our proposed Wiki3C can achieve a noticeable improvement over the baselines without considering contextual information.