Lexicalized hidden Markov models for part-of-speech tagging

Authors:
Sang-Zoo Lee;Jun-ichi Tsujii;Hae-Chang Rim
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;Korea University, Seoul, Korea
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 2
Cited 7

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Building probabilistic models for natural language

Building probabilistic models for natural language

Shallow parsing using specialized hmms

The Journal of Machine Learning Research
A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application

Machine Translation
Chinese named entity recognition using lexicalized HMMs

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Chinese word segmentation as morpheme-based lexical chunking

Information Sciences: an International Journal
Dependency-Based Chinese-English Statistical Machine Translation

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A novel lexicalized HMM-based learning framework for web opinion miningNOTE FROM ACM: A Joint ACM Conference Committee has been determined that the authors of this article violated ACM's publication policy on simultaneous submissions. Therefore ACM has shut off access to this paper.

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
OpinionMiner: a novel machine learning system for web opinion mining and extraction

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since most previous works for HMM-based tagging consider only part-of-speech information in contexts, their models cannot utilize lexical information which is crucial for resolving some morphological ambiguity. In this paper we introduce uniformly lexicalized HMMs for part-of-speech tagging in both English and Korean. The lexicalized models use a simplified back-off smoothing technique to overcome data sparseness. In experiments, lexicalized models achieve higher accuracy than non-lexicalized models and the back-off smoothing method mitigates data sparseness better than simple smoothing methods.