CatRelate: a new hierarchical document category integration algorithm by learning category relationships

  • Authors:
  • Shanfeng Zhu;Christopher C. Yang;Wai Lam

  • Affiliations:
  • Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan;Department of System Engineering and Engineering Management, The Chinese University of Hong, Hong Kong;Department of System Engineering and Engineering Management, The Chinese University of Hong, Hong Kong

  • Venue:
  • ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of integrating documents from a source catalog into a master catalog. Current technologies for solving the problem deem it as a flat category integration problem without considering the useful hierarchy information in the catalog, or deal with it hierarchically but without a rigorous model. In contrast, our method is based on correctly identifying relationships among categories, such as Match, Disjoint, SubConcept, SuperConcept, and Overlap, which come from the relations of sets in Set theory. Compared with traditional Match/NotMatch relationship in literature, our approach is more expressive in defining the relationship. The relationships among categories are first learned in a probabilistic way, and then refined by considering the hierarchy context. Our preliminary experiments show that it can help to correctly identify category relationships, and thus increase the accuracy of document integration.