Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier

  • Authors:
  • Suman Saha;C. A. Murthy;Sankar K. Pal

  • Affiliations:
  • Center for Soft Computing Research, Indian Statistical Institute,;Center for Soft Computing Research, Indian Statistical Institute,;Center for Soft Computing Research, Indian Statistical Institute,

  • Venue:
  • PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we introduce tensor space model for representing hypertext documents. We exploit the local-structure and neighborhood recommendation encapsulated in the proposed representation model. Instead of using the text on a page for representing features in a vector space model, we have used features on the page and neighborhood features to represent a hypertext document in a tensor space model. Tensor similarity measure is defined. We have demonstrated the use of rough set based ensemble classifier on proposed tensor space model. Experimental results of classification obtained by using our method outperform existing hypertext classification techniques.