Pattern dictionary development based on non-compositional language model for japanese compound and complex sentences

  • Authors:
  • Satoru Ikehara;Masato Tokuhisa;Jin'ichi Murakami;Masashi Saraki;Masahiro Miyazaki;Naoshi Ikeda

  • Affiliations:
  • Tottori University, Tottori-city, Japan;Tottori University, Tottori-city, Japan;Tottori University, Tottori-city, Japan;Nihon University, Tokyo, Japan;Niigata University, Niigata-city, Japan;Gifu University, Gifu-city, Japan

  • Venue:
  • ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large-scale sentence pattern dictionary (SP-dictionary) for Japanese compound and complex sentences has been developed. The dictionary has been compiled based on the non-compositional language model. Sentences with 2 or 3 predicates are extracted from a Japanese-to-English parallel corpus of 1 million sentences, and the compositional constituents contained within them are generalized to produce a SP-dictionary containing a total of 215,000 pattern pairs. In evaluation tests, the SP-dictionary achieved a syntactic coverage of 92% and a semantic coverage of 70%.