Error-driven HMM-based chunk tagger with context-dependent lexicon

  • Authors:
  • GuoDong Zhou;Jian Su

  • Affiliations:
  • Kent Ridge Digital Labs, Singapore;Kent Ridge Digital Labs, Singapore

  • Venue:
  • EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new error-driven HMM-based text chunk tagger with context-dependent lexicon. Compared with standard HMM-based tagger, this tagger uses a new Hidden Markov Modelling approach which incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement by keeping only positive lexical entries and makes it possible to further incorporate more context-dependent lexical entries. Experiments show that this technique achieves overall precision and recall rates of 93.40% and 93.95% for all chunk types, 93.60% and 94.64% for noun phrases, and 94.64% and 94.75% for verb phrases when trained on PENN WSJ TreeBank section 00-19 and tested on section 20-24, while 25-fold validation experiments of PENN WSJ TreeBank show overall precision and recall rates of 96.40% and 96.47% for all chunk types, 96.49% and 96.99% for noun phrases, and 97.13% and 97.36% for verb phrases.