Tensor space model for document analysis

Authors:
Deng Cai;Xiaofei He;Jiawei Han
Affiliations:
UIUC;Yahoo! Research Labs;UIUC
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 2
Cited 11

A vector space model for automatic indexing

Communications of the ACM
Generalized Low Rank Approximations of Matrices

Machine Learning

Nonnegative factor analysis for text document clustering

SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Locating nose-tips and estimating head poses in images by tensorposes

IEEE Transactions on Circuits and Systems for Video Technology
Application of rough ensemble classifier to web services categorization and focused crawling

Web Intelligence and Agent Systems
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae
Classification of web services using tensor space model and rough ensemble classifier

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A semantic self-organising webpage-ranking algorithm using computational geometry across different knowledge domains

International Journal of Knowledge and Web Intelligence
Towards a matrix-based distributional model of meaning

HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
A novel split and merge technique for hypertext classification

Transactions on rough sets XII
Multilinear decomposition and topographic mapping of binary tensors

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Tensor Field Model for higher-order information retrieval

Journal of Systems and Software
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space.In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.