Approximate Maximum Entropy Joint Feature Inference Consistent with Arbitrary Lower-Order Probability Constraints: Application to Statistical Classification

Authors:
David J. Miller;Lian Yan
Affiliations:
Department of Electrical Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.;Department of Electrical Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.
Venue:
Neural Computation
Year:
2000

Citing 21
Cited 3

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Elements of information theory

Elements of information theory
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Connectionist learning of belief networks

Artificial Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
Maximum entropy and learning theory

Neural Computation
On the effective implementation of the iterative proportional fitting procedure

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Optimal approximation of discrete probability distribution with kth-order dependency and its application to combining multiple classifiers

Pattern Recognition Letters
Graphical models for machine learning and digital communication

Graphical models for machine learning and digital communication
Estimating dependency structure as a hidden variable

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
A Guide to the Literature on Learning Probabilistic Networks from Data

IEEE Transactions on Knowledge and Data Engineering
Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
an entropy-driven system for construction of probabilistic expert systems from databases

UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
Neural Networks: A Comprehensive Foundation (3rd Edition)

Neural Networks: A Comprehensive Foundation (3rd Edition)
Minimax Entropy Principle and Its Application to Texture Modeling

Neural Computation
Building classifiers using Bayesian networks

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Critic-driven ensemble classification

IEEE Transactions on Signal Processing
A global optimization technique for statistical classifier design

IEEE Transactions on Signal Processing

A Maximum Entropy Approach for Collaborative Filtering

Journal of VLSI Signal Processing Systems
Transductive Methods for the Distributed Ensemble Classification Problem

Neural Computation
An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification

Journal of VLSI Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.