Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions

Authors:
Dmitry Pavlov
Affiliations:
-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 10
Cited 3

A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Prediction with local patterns using cross-entropy

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Visualization of navigation patterns on a Web site using model-based clustering

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic query models for transaction data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Probabilistic User Behavior Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Probabilistic User Behavior Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data mining for web personalization

The adaptive web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to modeling sequences usingmixtures of conditional maximum entropy (maxent) distributions.Our method generalizes the mixture of first-orderMarkov models by including the "long-term" dependenciesin model components.The "long-term" dependenciesare represented by the frequently used in the naturallanguage processing (NLP) domain probabilistic triggersor rules (suc as "A occured k positions back" \Longrightarrow"the current symbol is B" with probability P).The maxentframework is then used to create a coherent global probabilisticmodel from all selected triggers.In this paper, weenhance this formalism by using probabilistic mixtures withmaxent models as components, thus representing hidden orunobserved effects in the data.We demonstrate how ourmixture of conditional maxent models can be learned fromdata using the generalized EM algorithm that scales linearlyin the dimensions of the data and the number of mixturecomponents.We present empirical results on the simulatedand real-world data sets and demonstrate that theproposed approach enables us to create better quality modelsthan the mixtures of first-order Markov models and resistoverfitting and curse of dimensionality that would inevitablypresent themselves for the higher order Markov models.