Error-driven HMM-based chunk tagger with context-dependent lexicon

Authors:
GuoDong Zhou;Jian Su
Affiliations:
Kent Ridge Digital Labs, Singapore;Kent Ridge Digital Labs, Singapore
Venue:
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Year:
2000

Citing 9
Cited 15

A corpus-based approach to language learning

A corpus-based approach to language learning
Light parsing as finite state filtering

Extended finite state models of language
Information Retrieval

Information Retrieval
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A memory-based approach to learning shallow natural language patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3

Machine learning-based named entity recognition via effective integration of various evidences

Natural Language Engineering
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Coreference resolution using competition learning approach

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A Chinese efficient analyser integrating word segmentation, part-of-speech tagging, partial parsing and full parsing

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chunking-based Chinese word tokenization

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Discriminative hidden Markov modeling with long state dependence using a kNN ensemble

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A high-performance coreference resolution system using a constraint-based multi-agent strategy

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A robust multilingual portable phrase chunking system

Expert Systems with Applications: An International Journal
Efficient text chunking using linear kernel with masked method

Knowledge-Based Systems
A twin-candidate model for learning-based anaphora resolution

Computational Linguistics
Global learning of noun phrase anaphoricity in coreference resolution via label propagation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Direct modelling of output context dependence in discriminative hidden Markov model

Pattern Recognition Letters
Learning noun phrase anaphoricity in coreference resolution via label propagation

Journal of Computer Science and Technology - Special issue on natural language processing
Improving noun phrase coreference resolution by matching strings

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Mutual information independence model using kernel density estimation for segmenting and labeling sequential data

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new error-driven HMM-based text chunk tagger with context-dependent lexicon. Compared with standard HMM-based tagger, this tagger uses a new Hidden Markov Modelling approach which incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement by keeping only positive lexical entries and makes it possible to further incorporate more context-dependent lexical entries. Experiments show that this technique achieves overall precision and recall rates of 93.40% and 93.95% for all chunk types, 93.60% and 94.64% for noun phrases, and 94.64% and 94.75% for verb phrases when trained on PENN WSJ TreeBank section 00-19 and tested on section 20-24, while 25-fold validation experiments of PENN WSJ TreeBank show overall precision and recall rates of 96.40% and 96.47% for all chunk types, 96.49% and 96.99% for noun phrases, and 97.13% and 97.36% for verb phrases.