A maximum entropy approach to natural language processing
Computational Linguistics
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards automatic generation of natural language generation systems
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Tree linearization in English: improving language model based approaches
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Dependency-based n-gram models for general purpose sentence realisation
Natural Language Engineering
Hi-index | 0.00 |
In this paper we describe a method of acquiring word order from corpora. Word order is defined as the order of modifiers, or the order of phrasal units called 'bunsetsu' which depend on the same modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. This model shows us to what extent each piece of information contributes to deciding the word order and which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is efficiently learned by a model within a maximum entropy framework. The performance of this trained model can be evaluated by checking how many instances of word order selected by the model agree with those in the original text. In this paper, we show that even a raw corpus that has not been tagged can be used to train the model, if it is first analyzed by a parser. This is possible because the word order of the text in the corpus is correct.