Word order acquisition from corpora

Authors:
Kiyotaka Uchimoto;Masaki Murata;Qing Ma;Satoshi Sekine;Hitoshi Isahara
Affiliations:
Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan;New York University, New York, NY;Communications Research Laboratory, Ministry of Posts and Telecommunications, Hyogo, Japan
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Year:
2000

Citing 2
Cited 4

A maximum entropy approach to natural language processing

Computational Linguistics
Ordering among premodifiers

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Text generation from keywords

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards automatic generation of natural language generation systems

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Tree linearization in English: improving language model based approaches

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Dependency-based n-gram models for general purpose sentence realisation

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a method of acquiring word order from corpora. Word order is defined as the order of modifiers, or the order of phrasal units called 'bunsetsu' which depend on the same modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. This model shows us to what extent each piece of information contributes to deciding the word order and which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is efficiently learned by a model within a maximum entropy framework. The performance of this trained model can be evaluated by checking how many instances of word order selected by the model agree with those in the original text. In this paper, we show that even a raw corpus that has not been tagged can be used to train the model, if it is first analyzed by a parser. This is possible because the word order of the text in the corpus is correct.