Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Elements of information theory
Elements of information theory
Modern heuristic techniques for combinatorial problems
The EM algorithm for graphical association models with missing data
Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables
Machine Learning - Special issue on learning with probabilistic representations
Deterministic annealing EM algorithm
Neural Networks
Estimating dependency structure as a hidden variable
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An introduction to variational methods for graphical models
Learning in graphical models
A tutorial on learning with Bayesian networks
Learning in graphical models
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Learning Belief Networks in the Presence of Missing Values and Hidden Variables
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hidden Markov Model} Induction by Bayesian Model Merging
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Refinement and coarsening of Bayesian networks
UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
Multivariate Information Bottleneck
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Learning the Dimensionality of Hidden Variables
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Data perturbation for escaping local maxima in learning
Eighteenth national conference on Artificial intelligence
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Hierarchical Latent Class Models for Cluster Analysis
The Journal of Machine Learning Research
Annealing techniques for unsupervised statistical language learning
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Discovering the hidden structure of complex dynamic systems
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Continuation methods for mixing heterogeneous sources
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Learning mixtures of DAG models
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Score and information for recursive exponential models with incomplete data
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Learning equivalence classes of Bayesian network structures
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Annealing structural bias in multilingual weighted grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Incremental learning of cognitive concepts: a hidden variable networks approach
PCAR '06 Proceedings of the 2006 international symposium on Practical cognitive agents and robots
Minimum risk annealing for training log-linear models
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Statistical predicate invention
Proceedings of the 24th international conference on Machine learning
First-Order Probabilistic Languages: Into the Unknown
Inductive Logic Programming
A Bayesian Approach to Attention Control and Concept Abstraction
Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Symmetry breaking in soft clustering decoding of neural codes
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Pattern Recognition Letters
Learning Latent Tree Graphical Models
The Journal of Machine Learning Research
Ancestor relations in the presence of unobserved variables
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
The role of operation granularity in search-based learning of latent tree models
JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
Individual and group performance of computerized educational tasks
Education and Information Technologies
Expectation maximization over binary decision diagrams for probabilistic logic programs
Intelligent Data Analysis
Hi-index | 0.00 |
A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in sub-optimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging real-life data sets.