Quick overview:
We show how to obtain the Chinese Restaurant Process (CRP) as the limit of a family of finite Dirichlet mixtures. But before we start I would like to point the reader to a few references that I enjoyed and found useful.
- D. Aldous, Exchangeability and related topics (1983).
- C. Robert, G. Casella, Monte Carlo Statistical Methods, Technometrics, 42 (4), pp. 430-431
- R. Neal, Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics Vol. 9, No. 2 (2000), pp. 249-265.
- T. Broderick, https://people.csail.mit.edu/tbroderick/tutorials.html
Dirichlet Mixtures
Choose a concentration parameter $\alpha > 0$ and consider the following generative process for seating (or cluster) assignments $c_1,\ldots, c_t$ of $t$ customers to a fixed number of $k$ tables (or clusters) identified with $1,\ldots,k$ (I am using the terms clusters and tables interchangably, the same holds for cluster- and seating assignments):
\[\begin{eqnarray*} p = (p_1,\ldots,p_k) &\sim& \text{Dirichlet} (\tfrac{\alpha}{k}, \ldots, \tfrac{\alpha}{k}) \\ c_1, \ldots, c_t &\sim& \text{Categorical}(p_1,\ldots,p_k). \end{eqnarray*}\]By marginalizing over the mixture probabilities $p_i$ we compute the posterior
\[\tag{DIR} \begin{eqnarray*} p(c_{t} = i \mid c_1,\ldots,c_{t-1}; \alpha, k) = ... = \frac{\tfrac{\alpha}{k} + n_{i,t-1}}{\alpha + t - 1}, \end{eqnarray*}\]where $n_{i,t}$ denotes the size of cluster $i$ at time $t$, that is \(n_{i,t} = \vert\{ c_s : c_s = i, s\leq t \}\vert\).
CRP as Limit of Dirichlet Mixtures
As we will see below when we consider the limit $k\to \infty$ it will be helpful to group the outcomes of chosing any of the empty clusters (i.e. choosing $c_t = i$ with $n_{i,t-1}=0$ or put differently $c_t \neq c_s, \text{ for all } s < t$) into a single event that marks the start of a new cluster. If we denote by $k_t$ the number of non-empty clusters up at time $t$ (after sampling $c_t$) the number of empty clusters (those with $n_{c,t-1}=0$) is obviously $k - k_{t-1}$ and we have
\[\begin{eqnarray*} p(c_{t} | c_1,\ldots,c_{t-1}; \alpha, k) \propto \begin{cases} \tfrac{\alpha}{k} + n_{c_s,t-1} & \text{if $c_t = c_s$ for some $ s < t$,} \\ \tfrac{\alpha}{k} (k - k_{t-1}) & \text{if $c_{t} \neq c_s$ for all $s < t$}. \end{cases} \end{eqnarray*}\]In the limit $k \to \infty$ this yields
\[\tag{CRP} p(c_{t} | c_1,\ldots,c_{t-1}; \alpha) \propto \begin{cases} n_{c_s,t-1} & \text{if $c_t = c_s$ for some $ s < t$}, \\ \alpha & \text{if $c_{t} \neq c_s$ for all $s < t$}. \end{cases}\]The normalizing constant in the equation above is \(\alpha + t-1\). For the lack of a better term I call this probability the CRP kernel for the remainder of this tutorial. With this in hand we can now define a distribution over table assignments called the Chinese Restauran Process (CRP)
\[\label{eq:crp} \tag{CRP'} p(c_1,\ldots,c_T \mid \alpha) := \prod_{t=1}^T p(c_t \mid c_{t-1}, \ldots, c_{1}; \alpha).\]Although the CRP is defined sequentially, the order the customers sit down at their respective tables does not change the probablity of asingment — this corresponds to the exchangability of the process. This follows from rewriting the probability in terms of the table sizes $n_i$ which do not contain any sequential information: It is not hard to see that we have
\[p(c_1,\ldots,c_T \mid \alpha) = \frac{1}{\alpha \cdot (\alpha +1) \cdots (\alpha + T -1)} \cdot \alpha^{k} \cdot \prod_{i=1}^{k} (n_i -1)!.\]