Leveraging World Knowledge in Chinese Text Classification

  • Authors:
  • Shu Xu;Maosong Sun

  • Affiliations:
  • -;-

  • Venue:
  • ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades' endeavor, it seems that these approaches have all reached a plateau. In this paper,we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which we could then synthesize new training samples with common-sense world knowledge by performing a constrained web focus crawling. We show that by leveraging the domain-knowledge which otherwise can't be deduced from training set directly, the text classifier will have better generalization ability. Preliminary Experimental Results on several Chinese data sets confirm the effectiveness of this approach.