Finding aliases on the web using latent semantic analysis
Data & Knowledge Engineering - Special issue: WIDM 2002
Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools), Second Edition
Automatically Extracting Personal Name Aliases from the Web
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Hi-index | 0.00 |
Discovering aliases in Thai sports news is a challenging task. This paper presents an approach to identifying aliases by analyzing cooccurrence relationships between named entities. Semantically similar names are computed using two vector methods - Latent Semantic Analysis (LSA) and correlation matrix (COM). The LSA method decomposes a name-by-document matrix (NDM) into singular-value and singular-vector matrices. The truncated left singular vector matrix is used for identifying name similarity. The COM method constructs a name-byname matrix (NNM) from the NDM and then directly measures similarity among name vectors using simple calculations. Both methods are weighted by the same weighting schemes. Obtained similarity relations among names are filtered out based on name types. Our preliminary experimental results show that the COM method performs better than the LSA method.