Hierachically Classifying Chinese Web Documents without Dictionary Support and Segmentation Procedure

  • Authors:
  • Shuigeng Zhou;Ye Fan;Jiangtao Hu;Fang Yu;Yunfa Hu

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports a system that hierarchically classifies Chinese web documents without dictionary support and segmentation procedure. In our classifier, Web documents are represented by N-grams (N≤4) that are easy to be extracted. A boosting machine learning approach is applied to classifying Web Chinese documents that share a topic hierarchy. The open and modularized system architecture makes our classifier be extendible. Experimental results show that our system can effectively and efficiently classify Chinese Web documents.