A rough set approach to classifying web page without negative examples

  • Authors:
  • Qiguo Duan;Duoqian Miao;Kaimin Jin

  • Affiliations:
  • Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China;Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China;Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the problem of building Web page classifiers using positive and unlabeled examples, and proposes a more principled technique to solving the problem based on tolerance rough set and Support Vector Machine (SVM). It uses tolerance classes to approximate concepts existed in Web pages and enrich the representation of Web pages, draws an initial approximation of negative example. It then iteratively runs SVM to build classifier which maximizes margins to progressively improve the approximation of negative example. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. Experimental results show that the novel method outperforms existing methods significantly.