Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Authors:
Jun'Ichi Kazama;Jun'Ichi Tsujii
Affiliations:
School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), Ishikawa, Japan 923-1292;Aff1 Aff2
Venue:
Machine Learning
Year:
2005

Citing 23
Cited 9

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Exploiting auxiliary distributions in stochastic unification-based grammars

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
A fast algorithm for feature selection in conditional maximum entropy modeling

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Maximum entropy estimation for feature forests

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

A study of local and global thresholding techniques in text categorization

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Privacy-MaxEnt: integrating background knowledge in privacy quantification

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Latent semantic analysis for text categorization using neural network

Knowledge-Based Systems
An automatically constructed thesaurus for neural network based document categorization

Expert Systems with Applications: An International Journal
Clinical text classification under the Open and Closed Topic Assumptions

International Journal of Data Mining and Bioinformatics
Using continuous features in the maximum entropy model

Pattern Recognition Letters
Polynomial to linear: efficient classification with conjunctive features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Hard constraints for grammatical function labelling

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Informed ways of improving data-driven dependency parsing for German

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.