Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing
Computational Linguistics
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Mining generalized association rules
Future Generation Computer Systems - Special double issue on data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection and Dualities in Maximum Entropy Discrimination
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation
Natural Language Engineering
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Continuation methods for mixing heterogeneous sources
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
In this work, we are concerned with a coarse grained semantic analysis over sparse data, which labels all nouns with a set of semantic categories. To get the benefit of unlabeled data, we propose a bootstrapping framework with Maximum Entropy modeling (MaxEnt) as the statistical learning component. During the iterative tagging process, unlabeled data is used not only for better statistical estimation, but also as a medium to integrate non-statistical knowledge into the model training. Two main issues are discussed in this paper. First, Association Rule principles are suggested to guide MaxEnt feature selections. Second, to guarantee the convergence of the boot-strapping process, three adjusting strategies are proposed to soft tag unlabeled data.