A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition
Statistical methods for speech recognition
Prediction with local patterns using cross-entropy
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Visualization of navigation patterns on a Web site using model-based clustering
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic query models for transaction data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Probabilistic User Behavior Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Probabilistic User Behavior Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data mining for web personalization
The adaptive web
Hi-index | 0.00 |
We present a novel approach to modeling sequences usingmixtures of conditional maximum entropy (maxent) distributions.Our method generalizes the mixture of first-orderMarkov models by including the "long-term" dependenciesin model components.The "long-term" dependenciesare represented by the frequently used in the naturallanguage processing (NLP) domain probabilistic triggersor rules (suc as "A occured k positions back" \Longrightarrow"the current symbol is B" with probability P).The maxentframework is then used to create a coherent global probabilisticmodel from all selected triggers.In this paper, weenhance this formalism by using probabilistic mixtures withmaxent models as components, thus representing hidden orunobserved effects in the data.We demonstrate how ourmixture of conditional maxent models can be learned fromdata using the generalized EM algorithm that scales linearlyin the dimensions of the data and the number of mixturecomponents.We present empirical results on the simulatedand real-world data sets and demonstrate that theproposed approach enables us to create better quality modelsthan the mixtures of first-order Markov models and resistoverfitting and curse of dimensionality that would inevitablypresent themselves for the higher order Markov models.