A stochastic language model using dependency and its improvement by word clustering

Authors:
Shinsuke Mori;Makoto Nagao
Affiliations:
Tokyo Research Labolatory, IBM Japan, Ltd., Yamatoshi, Japan;Kyoto University, Kyoto, Japan
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Year:
1998

Citing 5
Cited 4

Class-based n-gram models of natural language

Computational Linguistics
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Decision tree parsing using a hidden derivation model

HLT '94 Proceedings of the workshop on Human Language Technology
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

A stochastic parser based on a structural word prediction model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Self-organizing Chinese and Japanese semantic maps

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel corpora

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Don't compare averages

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a stochastic language model for Japanese using dependency. The prediction unit in this model is an attribute of "bunsetsu". This is represented by the product of the head of content words and that of function words. The relation between the attributes of "bunsetsu" is ruled by a context-free grammar. The word sequences are predicted from the attribute using word n-gram model. The spell of Unknow word is predicted using character n-gram model. This model is robust in that it can compute the probability of an arbitrary string and is complete in that it models from unknown word to dependency at the same time.