Word order acquisition from corpora

  • Authors:
  • Kiyotaka Uchimoto;Masaki Murata;Qing Ma;Satoshi Sekine;Hitoshi Isahara

  • Affiliations:
  • Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;New York University, New York, NY;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a method of acquiring word order from corpora. Word order is defined as the order of modifiers, or the order of phrasal units called 'bunsetsu' which depend on the same modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. This model shows us to what extent each piece of information contributes to deciding the word order and which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is efficiently learned by a model within a maximum entropy framework. The performance of this trained model can be evaluated by checking how many instances of word order selected by the model agree with those in the original text. In this paper, we show that even a raw corpus that has not been tagged can be used to train the model, if it is first analyzed by a parser. This is possible because the word order of the text in the corpus is correct.