Independent component analysis for near-synonym choice

  • Authors:
  • Liang-Chih Yu;Wei-Nan Chien

  • Affiliations:
  • -;-

  • Venue:
  • Decision Support Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite their similar meanings, near-synonyms may have different usages in different contexts, and the development of algorithms that can verify whether near-synonyms do match their given contexts has been the focus of increasing concern. Such algorithms have many applications such as query expansion for information retrieval (IR), alternative word selection for writing support systems, and (near-)duplicate detection for text summarization. In this paper, we propose a framework that incorporates latent semantic analysis (LSA) and independent component analysis (ICA) to automatically select suitable near-synonyms according to the given context. LSA is used to discover useful latent features that do not frequently occur in the contexts of near-synonyms, and ICA is used to estimate a set of independent components by minimizing the dependence between features. An SVM classifier is then trained with the independent components for best near-synonym prediction. In experiments, we evaluate the proposed method on both Chinese and English sentences, and compare its performance to state-of-the-art supervised and unsupervised methods. Experimental results show that training on the independent components that contain useful contextual features with minimized term dependence can improve the classifiers' ability to discriminate among near-synonyms, thus yielding better performance.