Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel corpora

  • Authors:
  • Qing Ma;Kyoko Kanzaki;Yujie Zhang;Masaki Murata;Hitoshi Isahara

  • Affiliations:
  • Department of Applied Mathematics and Informatics, Faculty of Science and Technology, Ryukoku University, Seta, Otsu 520-2194, Japan;Keihanna Human Info-Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Kyoto 619-0289, Japan;Keihanna Human Info-Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Kyoto 619-0289, Japan;Keihanna Human Info-Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Kyoto 619-0289, Japan;Keihanna Human Info-Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Kyoto 619-0289, Japan

  • Venue:
  • Neural Networks - 2004 Special issue: New developments in self-organizing systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a method involving self-organizing monolingual semantic maps that are visible and continuous representations where Chinese or Japanese words with similar meanings are placed at the same or neighboring points so that the distance between them represents the semantic similarity. We used the self-organizing map, SOM, as a self-organizing device. The words to be self-organized are defined by sets of co-occurring words collected from Chinese or Japanese newspapers, according to their grammatical relationships. The words are then coded into vectors to be forwarded to the SOM, taking into account the semantic correlation between them, which is established using a form of word-similarity computation. The self-organized monolingual semantic maps are assessed by numerical evaluations of accuracy, recall, and the F-measure, as well as by intuition, and by the comparisons with a clustering method and with multivariate statistical analysis. This paper further discusses the possibility that the method we propose can be extended to constructing Japanese--Chinese bilingual semantic maps, with the aim of providing a semantics-based approach to word alignment in Japanese--Chinese parallel corpora. We also show the effectiveness of this extended method through small-scale comparative experiments with a baseline method, where the alignment of Japanese and Chinese words is directly determined through the Euclidean distance of vectors representing the words, with a clustering method, and with multivariate statistical analysis.