Effect of relationships between words on Japanese information retrieval

  • Authors:
  • Atsushi Matsumura;Atsuhiro Takasu;Jun Adachi

  • Affiliations:
  • University of Tsukuba, Ibaraki, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two Japanese-language information retrieval (IR) methods that enhance retrieval effectiveness by utilizing the relationships between words are proposed. The first method uses dependency relationships between words in a sentence. The second method uses proximity relationships, particularly information about the ordered co-occurrence of words in a sentence, to approximate the dependency relationships between them. A Structured Index has been constructed for these two methods, which represents the dependency relationships between words in a sentence as a set of binary trees. The Structured Index is created by morphological analysis and dependency analysis based on simple template matching and compound noun analysis derived from word statistics. Through retrieval experiments using the Japanese test collection for information retrieval systems (NTCIR-1, the NACSIS Test Collection for IR systems), it is shown that these two methods offer superior retrieval effectiveness compared with the TF--IDF method, and are effective with different databases and diverse search topics sets. There is little difference in retrieval effectiveness between these two methods.