Mutual information independence model using kernel density estimation for segmenting and labeling sequential data

Authors:
Guodong Zhou;Lingpeng Yang;Jian Su;Donghong Ji
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 16
Cited 0

Self-organized language modeling for speech recognition

Readings in speech recognition
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Information Retrieval

Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Applying system combination to base noun phrase identification

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Error-driven HMM-based chunk tagger with context-dependent lexicon

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a Mutual Information Independence Model (MIIM) to segment and label sequential data. MIIM overcomes the strong context independent assumption in traditional generative HMMs by assuming a novel pairwise mutual information independence. As a result, MIIM separately models the long state dependence in its state transition model in a generative way and the observation dependence in its output model in a discriminative way. In addition, a variable-length pairwise mutual information-based modeling approach and a kNN algorithm using kernel density estimation are proposed to capture the long state dependence and the observation dependence respectively. The evaluation on shallow parsing shows that MIIM can effectively capture the long context dependence to segment and label sequential data. It is interesting to note that using kernel density estimation leads to increased performance over using a classifier-based approach.