Incorporating linguistic structure into maximum entropy language models

  • Authors:
  • GaoLin Fang;Wen Gao;ZhaoQi Wang

  • Affiliations:
  • Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, P.R. China;Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, P.R. China and Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, P. ...;Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, P.R. China

  • Venue:
  • Journal of Computer Science and Technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

In statistical language models, how to integrate diverse linguistic knowledge in a general framework for long-distance dependencies is a challenging issue. In this paper, an improved language model incorporating linguistic structure into maximum entropy framework is presented. The proposed model combines trigram with the structure knowledge of base phrase in which trigram is used to capture the local relation between words, while the structure knowledge of base phrase is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and vocabulary is integrated into the maximum entropy framework. Experimental results show that the proposed model improves by 24% for language model perplexity and increases about 3% for sign language recognition rate compared with the trigram model.