A novel split and merge technique for hypertext classification

  • Authors:
  • Suman Saha;C. A. Murthy;Sankar K. Pal

  • Affiliations:
  • Center for Soft Computing Research, Indian Statistical Institute;Center for Soft Computing Research, Indian Statistical Institute;Center for Soft Computing Research, Indian Statistical Institute

  • Venue:
  • Transactions on rough sets XII
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As web grows at an increasing speed, hypertext classification is becoming a necessity. While the literature on text categorization is quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we introduce a novel split and merge technique for classification of hypertext documents. The splitting process is performed at the feature level by representing the hypertext features in a tensor space model. We exploit the local-structure and neighborhood recommendation encapsulated in the this representation model. The merging process is performed on multiple classifications obtained from split representation. A meta level decision system is formed by obtaining predictions of base level classifiers trained on different components of the tensor and actual category of the hypertext document. These individual predictions for each component of the tensor are subsequently combined to a final prediction using rough set based ensemble classifiers. Experimental results of classification obtained by using our method is marginally better than other existing hypertext classification techniques.