Combining global and local information for enhanced deep classification

  • Authors:
  • Heung-Seon Oh;Yoonjung Choi;Sung-Hyon Myaeng

  • Affiliations:
  • Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, South Korea;Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, South Korea;Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, South Korea

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compared to traditional text classification with a flat category set or a small hierarchy of categories, classifying web pages to a large-scale hierarchy such as Open Directory Project (ODP) and Yahoo! Directory is challenging. While a recently proposed "deep" classification method makes the problem tractable, it still suffers from low classification performance. A major problem is the lack of training data, which is unavoidable with such a huge hierarchy. Training pages associated with the category nodes are short, and their distributions are skewed. To alleviate the problem, we propose a new training data selection strategy and a naïve Bayes combination model, which utilize both local and global information. We conducted a series of experiments with the ODP hierarchy containing more than 100,000 categories to show that the proposed method of using both local and global information indeed helps avoiding the training data sparseness problem, outperforming the state-of-art method.