Learning-based concept-hierarchy refinement through exploiting topology, content and social information

  • Authors:
  • Tsung-Ting Kuo;Shou-De Lin

  • Affiliations:
  • National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan;National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

Concept hierarchies, such as the ACM Computing Classification Scheme and InterPro Protein Sequence Classification, are widely used in categorization and indexing applications. In the Internet and Web 2.0 era, new concepts and terms are emerging on an almost daily basis, so it is essential that such hierarchies maintain up-to-date records of concepts. This paper proposes a mechanism to identify the most suitable position to insert new terms into an existing concept hierarchy. The problem is challenging because there are hundreds or even thousands of candidate positions for insertion. Furthermore, usually there is no training instance available for an insertion; nor is it practical to assume the availability of a detailed description of the target concept, except in the hierarchy itself. To resolve the problem, we exploit the topology, content and social information, and apply a learning approach to identify the underlying construction criteria of the concept hierarchy. We utilize three metrics (namely, accuracy, taxonomic closeness, and ranking) to evaluate the proposed learning-based approach on the ACM CCS, the DOAJ and the InterPro datasets to evaluate the proposed learning-based approach. The results demonstrate that, in all three metrics, our approach outperforms similarity-based approaches, such as the Normalized Google Distance, by a significant margin. Finally, we propose a level-based recommendation scheme as a novel application of our system. The source code, dataset, and other related resources are available at http://www.csie.ntu.edu.tw/~d97944007/refinement/.