KACTL: knowware based automated construction of a treelike library from web documents

  • Authors:
  • Ruqian Lu;Yu Huang;Kai Sun;Zhongxiang Chen;Yiwen Chen;Songmao Zhang

  • Affiliations:
  • Institute of Computing Technology, CAS Key Lab of IIP, China,Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China;Institute of Computing Technology, CAS Key Lab of IIP, China;Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China;Institute of Computing Technology, CAS Key Lab of IIP, China;Tianjin University, China;Institute of Computing Technology, CAS Key Lab of IIP, China,Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China

  • Venue:
  • WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposed a knowware based supervised machine learning technique for domain specific regression and classification of Web documents. It is simple because it is only based on word counting techniques without natural language understanding and complicated statistic techniques. Starting from constructing a domain sub-division tree and assigning a training set of documents to its nodes, the algorithm produces a labeled classification tree with a characteristic vector for each node. This tree is used to classify any number of documents in that particular domain. A tool for developing Web portal is also provided to build a Web station for displaying the final treelike library of documents.