A novel split and merge technique for hypertext classification

Authors:
Suman Saha;C. A. Murthy;Sankar K. Pal
Affiliations:
Center for Soft Computing Research, Indian Statistical Institute;Center for Soft Computing Research, Indian Statistical Institute;Center for Soft Computing Research, Indian Statistical Institute
Venue:
Transactions on rough sets XII
Year:
2010

Citing 18
Cited 0

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Vector space model of information retrieval: a reevaluation

SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
Multilinear Analysis of Image Ensembles: TensorFaces

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Analysis of anchor text for web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Fast and accurate text classification via multiple linear discriminant projections

The VLDB Journal — The International Journal on Very Large Data Bases
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Text Representation: From Vector to Tensor

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Tensor space model for document analysis

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond streams and graphs: dynamic tensor analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Rough set Based Ensemble Classifier forWeb Page Classification

Fundamenta Informaticae
Web page classification with heterogeneous data fusion

Proceedings of the 16th international conference on World Wide Web
Tensor Space Models for Authorship Identification

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Improvement of HITS for topic-specific web crawler

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Link-Local features for hypertext classification

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

As web grows at an increasing speed, hypertext classification is becoming a necessity. While the literature on text categorization is quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we introduce a novel split and merge technique for classification of hypertext documents. The splitting process is performed at the feature level by representing the hypertext features in a tensor space model. We exploit the local-structure and neighborhood recommendation encapsulated in the this representation model. The merging process is performed on multiple classifications obtained from split representation. A meta level decision system is formed by obtaining predictions of base level classifiers trained on different components of the tensor and actual category of the hypertext document. These individual predictions for each component of the tensor are subsequently combined to a final prediction using rough set based ensemble classifiers. Experimental results of classification obtained by using our method is marginally better than other existing hypertext classification techniques.