Improving part-of-speech tagging using lexicalized HMMs

Authors:
Ferran Pla;Antonio Molina
Affiliations:
Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es
Venue:
Natural Language Engineering
Year:
2004

Citing 18
Cited 4

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A Machine Learning Approach to POS Tagging

Machine Learning
Learning grammatical stucture using statistical decision-trees

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Shallow parsing using specialized hmms

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Tagging French: comparing a statistical and a constraint-based method

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic grammar induction and parsing free text: a transformation-based approach

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Tagging and chunking with bigrams

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2

Chinese named entity recognition using lexicalized HMMs

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Natural language tagging with genetic algorithms

Information Processing Letters
Using target-language information to train part-of-speech taggers for machine translation

Machine Translation
Adding morphological information to a connectionist part-of-speech tagger

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.