Language dynamics and capitalization using maximum entropy

  • Authors:
  • Fernando Batista;Nuno Mamede;Isabel Trancoso

  • Affiliations:
  • L2F -- Spoken Language Systems Laboratory - INESC ID Lisboa, Lisboa, Portugal and ISCTE -- Instituto de Ciências do Trabalho e da Empresa, Portugal;L2F -- Spoken Language Systems Laboratory - INESC ID Lisboa, Lisboa, Portugal and IST -- Instituto Superior Técnico, Portugal;L2F -- Spoken Language Systems Laboratory - INESC ID Lisboa, Lisboa, Portugal and IST -- Instituto Superior Técnico, Portugal

  • Venue:
  • HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the impact of written language variations and the way it affects the capitalization task over time. A discriminative approach, based on maximum entropy models, is proposed to perform capitalization, taking the language changes into consideration. The proposed method makes it possible to use large corpora for training. The evaluation is performed over newspaper corpora using different testing periods. The achieved results reveal a strong relation between the capitalization performance and the elapsed time between the training and testing data periods.