Enhancing multilingual latent semantic analysis with term alignment information

Authors:
Brett W. Bader;Peter A. Chew
Affiliations:
Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 5
Cited 4

Using linear algebra for intelligent information retrieval

SIAM Review
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Co-ranking Authors and Documents in a Heterogeneous Network

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Affiliation recommendation using auxiliary networks

Proceedings of the fourth ACM conference on Recommender systems
Cross-lingual document representation and semantic similarity measure: a fuzzy set and rough set based approach

IEEE Transactions on Fuzzy Systems
An information-theoretic, vector-space-model approach to cross-language information retrieval*

Natural Language Engineering
Scalable Affiliation Recommendation using Auxiliary Networks

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Analysis (LSA) is based on the Singular Value Decomposition (SVD) of a term-by-document matrix for identifying relationships among terms and documents from cooccurrence patterns. Among the multiple ways of computing the SVD of a rectangular matrix X, one approach is to compute the eigenvalue decomposition (EVD) of a square 2 x 2 composite matrix consisting of four blocks with X and XT in the off-diagonal blocks and zero matrices in the diagonal blocks. We point out that significant value can be added to LSA by filling in some of the values in the diagonal blocks (corresponding to explicit term-to-term or document-to-document associations) and computing a term-by-concept matrix from the EVD. For the case of multilingual LSA, we incorporate information on cross-language term alignments of the same sort used in Statistical Machine Translation (SMT). Since all elements of the proposed EVD-based approach can rely entirely on lexical statistics, hardly any price is paid for the improved empirical results. In particular, the approach, like LSA or SMT, can still be generalized to virtually any language(s); computation of the EVD takes similar resources to that of the SVD since all the blocks are sparse; and the results of EVD are just as economical as those of SVD.