Analysis of Japanese compound nouns by direct text scanning

Authors:
Toru Hisamitsu;Yoshihiko Nitta
Affiliations:
Advanced Research Laboratory, Hitachi, Ltd., Saitama, Japan;Advanced Research Laboratory, Hitachi, Ltd., Saitama, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 3
Cited 1

The semantic interpretation of compound nominals

The semantic interpretation of compound nominals
Corpus statistics meet the noun compound: some empirical results

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Analysis of Japanese compound nouns using collocational information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

A hybrid approach to interactive machine translation: integrating rule-based, corpus-based, and example-based method

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper aims to analyze word dependency structure in compound nouns appearing in Japanese newspaper articles. The analysis is a difficult problem because such compound nouns can be quite long, have no word boundaries between contained nouns, and often contain unregistered words such as abbreviations. The non-segmentation property and unregistered words cause initial segmentation errors which result in erroneous analysis. This paper presents a corpus-based approach which scans a corpus with a set of pattern matchers and gathers cooccurrence examples to analyze compound nouns. It employs boot-strapping search to cope with unregistered words: if an unregistered word is found in the process of searching the examples, it is recorded and invokes additional searches to gather the examples containing it. This makes it possible to correct initial oversegmentation errors, and leads to higher accuracy. The accuracy of the method is evaluated using the compound nouns of length 5, 6, 7, and 8. A baseline is also introduced and compared.