Maximum entropy modeling in sparse semantic tagging

Authors:
Jia Cui;David Guthrie
Affiliations:
Johns Hopkins University, Baltimore, MD;University of Sheffield, Sheffield, UK
Venue:
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Year:
2004

Citing 11
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Mining generalized association rules

Future Generation Computer Systems - Special double issue on data mining
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection and Dualities in Maximum Entropy Discrimination

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation

Natural Language Engineering
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Continuation methods for mixing heterogeneous sources

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we are concerned with a coarse grained semantic analysis over sparse data, which labels all nouns with a set of semantic categories. To get the benefit of unlabeled data, we propose a bootstrapping framework with Maximum Entropy modeling (MaxEnt) as the statistical learning component. During the iterative tagging process, unlabeled data is used not only for better statistical estimation, but also as a medium to integrate non-statistical knowledge into the model training. Two main issues are discussed in this paper. First, Association Rule principles are suggested to guide MaxEnt feature selections. Second, to guarantee the convergence of the boot-strapping process, three adjusting strategies are proposed to soft tag unlabeled data.