The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

  • Authors:
  • David M. Blei;Thomas L. Griffiths;Michael I. Jordan

  • Affiliations:
  • Princeton University, Princeton, New Jersey;University of California, Berkeley, California;University of California, Berkeley, California

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.