Effect of relationships between words on Japanese information retrieval

Authors:
Atsushi Matsumura;Atsuhiro Takasu;Jun Adachi
Affiliations:
University of Tsukuba, Ibaraki, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2006

Citing 7
Cited 0

Document retrieval: A structural approach

Information Processing and Management: an International Journal
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking algorithms

Information retrieval
Comparison between proximity operation and dependency operation in Japanese full-text retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Linguistic Processing of Text for Large-Scale Conceptual Information Retrieval System

ICCS '94 Proceedings of the Second International Conference on Conceptual Structures: Current Practices
A corpus-based approach for Korean nominal compound analysis based on linguistic and statistical information

Natural Language Engineering
User-chosen phrases in interactive query formulation for information retrieval

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two Japanese-language information retrieval (IR) methods that enhance retrieval effectiveness by utilizing the relationships between words are proposed. The first method uses dependency relationships between words in a sentence. The second method uses proximity relationships, particularly information about the ordered co-occurrence of words in a sentence, to approximate the dependency relationships between them. A Structured Index has been constructed for these two methods, which represents the dependency relationships between words in a sentence as a set of binary trees. The Structured Index is created by morphological analysis and dependency analysis based on simple template matching and compound noun analysis derived from word statistics. Through retrieval experiments using the Japanese test collection for information retrieval systems (NTCIR-1, the NACSIS Test Collection for IR systems), it is shown that these two methods offer superior retrieval effectiveness compared with the TF--IDF method, and are effective with different databases and diverse search topics sets. There is little difference in retrieval effectiveness between these two methods.