Text Representation: From Vector to Tensor

  • Authors:
  • Ning Liu;Benyu Zhang;Jun Yan;Zheng Chen;Wenyin Liu;Fengshan Bai;Leefeng Chien

  • Affiliations:
  • Tsinghua University;Microsoft Research Asia;Peking University;Microsoft Research Asia;City University of Hong Kong;Tsinghua University;Academia Sinica

  • Venue:
  • ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a text representation model, Tensor Space Model (TSM), which models the text by multilinear algebraic high-order tensor instead of the traditional vector. Supported by techniques of multilinear algebra, TSM offers a potent mathematical framework for analyzing the multifactor structures. TSM is further supported by certain introduced particular operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for dimension reduction and other applications. Experimental results on the 20 Newsgroups dataset show that TSM is constantly better than VSM for text classification.