A vector space model for automatic indexing
Communications of the ACM
Generalized Low Rank Approximations of Matrices
Machine Learning
Nonnegative factor analysis for text document clustering
SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Locating nose-tips and estimating head poses in images by tensorposes
IEEE Transactions on Circuits and Systems for Video Technology
Application of rough ensemble classifier to web services categorization and focused crawling
Web Intelligence and Agent Systems
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Classification of web services using tensor space model and rough ensemble classifier
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
International Journal of Knowledge and Web Intelligence
Towards a matrix-based distributional model of meaning
HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Multilinear decomposition and topographic mapping of binary tensors
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Tensor Field Model for higher-order information retrieval
Journal of Systems and Software
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Hi-index | 0.00 |
Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space.In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.