Distance Dependent Chinese Restaurant Processes

Authors:
David M. Blei;Peter I. Frazier
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2011

Citing 6
Cited 4

Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A permutation-augmented sampler for DP mixture models

Proceedings of the 24th international conference on Machine learning
The matrix stick-breaking process for flexible multi-task learning

Proceedings of the 24th international conference on Machine learning
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM (JACM)

Coreference resolution in a modular, entity-centered model

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Bayesian approach to unsupervised semantic role induction

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Entity clustering across languages

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Topic model for user reviews with adaptive windows

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.