Tensor Framework and Combined Symmetry for Hypertext Mining

Authors:
Suman Saha;C.A. Murthy;Sankar K. Pal
Affiliations:
Center for Soft Computing Research Indian Statistical Institute, India. E-mail: {ssaha_r,murthy,sankar}@isical.ac.in;Center for Soft Computing Research Indian Statistical Institute, India. E-mail: {ssaha_r,murthy,sankar}@isical.ac.in;Center for Soft Computing Research Indian Statistical Institute, India. E-mail: {ssaha_r,murthy,sankar}@isical.ac.in
Venue:
Fundamenta Informaticae
Year:
2009

Citing 25
Cited 0

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Information Retrieval

Information Retrieval
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Vector space model of information retrieval: a reevaluation

SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
Multilinear Analysis of Image Ensembles: TensorFaces

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Utilizing hyperlink transitivity to improve web page clustering

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multi-View Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Web Documents Clustering with Interest Links

SOSE '05 Proceedings of the IEEE International Workshop
Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Text Representation: From Vector to Tensor

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A comparison of implicit and explicit links for web page classification

Proceedings of the 15th international conference on World Wide Web
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Tensor space model for document analysis

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond streams and graphs: dynamic tensor analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Web page classification with heterogeneous data fusion

Proceedings of the 16th international conference on World Wide Web
Tensor Space Models for Authorship Identification

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Classification of web services using tensor space model and rough ensemble classifier

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Improvement of HITS for topic-specific web crawler

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Link-Local features for hypertext classification

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have made a case here for utilizing tensor framework for hypertext mining. Tensor is a generalization of vector and tensor framework discussed here is a generalization of vector space model which is widely used in the information retrieval and web mining literature. Most hypertext documents have an inherent internal tag structure and external link structure that render the desirable use of multidimensional representations such as those offered by tensor objects. We have focused on the advantages of Tensor Space Model, in which documents are represented using sixth-order tensors. We have exploited the local-structure and neighborhood recommendation encapsulated by the proposed representation. We have defined a similarity measure for tensor objects corresponding to hypertext documents, and evaluated the proposed measure for mining tasks. The superior performance of the proposed methodology for clustering and classification tasks of hypertext documents have been demonstrated here. The experiment using different types of similarity measure in the different components of hypertext documents provides the main advantage of the proposed model. It has been shown theoretically that, the computational complexity of an algorithm performing on tensor framework using tensor similarity measure as distance is at most the computational complexity of the same algorithmperforming on vector space model using vector similarity measure as distance.