The vector space models for finding co-occurrence names as aliases in Thai sports news

  • Authors:
  • Thawatchai Suwanapong;Thanaruk Theeramunkong;Ekawit Nantajeewarawat

  • Affiliations:
  • Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand

  • Venue:
  • ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering aliases in Thai sports news is a challenging task. This paper presents an approach to identifying aliases by analyzing cooccurrence relationships between named entities. Semantically similar names are computed using two vector methods - Latent Semantic Analysis (LSA) and correlation matrix (COM). The LSA method decomposes a name-by-document matrix (NDM) into singular-value and singular-vector matrices. The truncated left singular vector matrix is used for identifying name similarity. The COM method constructs a name-byname matrix (NNM) from the NDM and then directly measures similarity among name vectors using simple calculations. Both methods are weighted by the same weighting schemes. Obtained similarity relations among names are filtered out based on name types. Our preliminary experimental results show that the COM method performs better than the LSA method.