This is a preliminary version. The algorithm was originally proposed in (Metropolis et al., 1953) and later extended in (Hastings, 1970). For an exhaustive description of the algorithm and other MC methods I refer the reader to (Robert & Casella, 2005).

Quick overview:

Defns of detailed balance and stationary distribution of a Markov chain.
Detailed balance implies stationarity.
Defn of MH Kernel.
MH Kernel satisfies detailed balance: formal and intuitive proof.

Prelims

We start with a couple of definitions to set the stage for the algorithm and why it works. Recall that a stationary distribution $\pi$ of the chain $K$ is characterized by

\[\pi(Y) = \int_X \pi(x) \ K(x,Y) \ dx,\]

for all events $Y$. Note that for the discrete finite case, if $K$ is just a transition matrix, this tranlates to

\[\pi = \pi K.\]

Moreover, for the Markov chain $(X_n)$ defined by $X_n \sim K(X_{n-1}, X_n)$, if $X_0 \sim \pi$ then $X_n \sim \pi$ for every $n$. What is a bit more involved is showing that now matter the initial distribution the chain converges to $\pi$ (cf. (Robert & Casella, 2005)), i.e. showing that for any state $x$ (and for some appropriate norm) we have

\[\lim_{n\to\infty} \| K^n(x,.) - \pi \| \stackrel{!}{=} 0.\]

"There are many conditions that can be placed on the convergence of $P^n$, the distribution of $X_n$, to $\pi$. Perhaps, the most fundamental and important is that of ergodicity, that is, independence of initial conditions." (Robert & Casella, 2005)

A Markov chain with transition kernel $K$ satisfies the detailed balance condition if there is a function $f$ such that for every $x,y$

\[f(x) \ K(x,y) = f(y) \ K(y,x).\]

If $f$ is in fact a probability density this implies reversibility, so detailed balance is slightly "weaker" than reversibility. (I think we need the uniqueness of the marginal/stationary distribution here... ask me for my notes)

If $f$ is a probability density – as it will be in the MH case below – it implies that the chain is reversible and $f$ is a stationary distribution of the chain. Simply compute

\[\begin{eqnarray*} \int_X f(x) \ K(x,Y) \ dx &=& \int_X \int_Y f(x) \ K(x,y) \ dy \ dx \\ &=& \int_X \int_Y f(y) \ K(y,x) \ dy \ dx \\ &=& \int_Y f(y) \ \Big[ \int_X K(y, x) \ dx \Big] \ dy \\ &=& \int_Y \ f(y) \ dy \\ &=& f(Y) . \end{eqnarray*}\]

Metropolis-Hastings transition kernel

Given a distribtution $p(x)$ we will now create a Markov chain $(X_1,X_2,\ldots)$ with stationary distribution $p(x)$. For a family of proposal distributions $q(y \mid x)$ we define the acceptance probability (of a proposed state transition $x \to y$) as

\[a(x, y) := \min \Big\{ 1, \frac{q(x \mid y)}{q(y \mid x)} \frac{p(y)}{p(x)} \Big\}.\]

As the name suggests $a(x,y)$ guides which proposals to accept and which to reject. The way it is set up ensures that $p$ is the stationary distribution of the chain. But we are getting ahead of our self. Let us define the Markov chain first.

Pick an initial state $x_0$ and sample $x_1, x_2, x_3, \ldots$ by iterating the following process for $t \geq 1$:

\[\left.\begin{aligned} y_t \ & \sim \ q(y \mid x_{t-1}) \\ a_t \ & \sim \ \text{Bernoulli}\big(a({x_{t-1}, y_t}) \big) \end{aligned}\right. \ \ \ \ \longrightarrow \ \ \ \ \ x_t := \begin{cases} y_t & \text{if $a_t$ is true,} \\ x_{t-1} & \text{else.} \end{cases}\]

We can also write down the MH kernel explicityly: Let $r(x) = 1 - \int a(x,\tilde{y}) \ q(\tilde{y} \mid x) d\tilde{y}$ denote the mass of ending up in the second branch in the above if-else statement, or equivalently the mass of the event of any proposal being rejected. Then we have

\[K(x,y) = a(x,y) \ q(y \mid x) + r(x) \ \delta_x(y).\]

Detailed balance for MH kernel

The above Markov chain satisfies the detailed balance condition: Note that the detailed balance condition is always satisfied for $x=y$ which is the only case for which the second term is non-vanishing. It remains to show detailed balance for the first term, which can be written as

\[\begin{eqnarray*} p(x) \ \big( a(x,y) \ q(y \mid x) \big) &=& \min \Big\{ 1, \frac{q(x \mid y) \ p(y)}{q(y \mid x) \ p(x)} \Big\} \ q(y \mid x) \ p(x) \\ &=& \min \Big\{ q(y \mid x) \ p(x), q(x \mid y) \ p(y) \Big\}. \end{eqnarray*}\]

The expression on the right side is clearly symmetric in $x$ and $y$ and thus implies detailed balance.

Intuition

Note that we consider the probability ratio of the transitions of the reversed transition $y \to x$ over the proposed transition $x \to y$ and we distinguish two cases:

\[\text{(i)} \ \ a=1 \ \ \ \text{and} \ \ \ \text{(ii)} \ \ a<1 .\]

The first case translates to $p(y \to x) \geq p(x \to y)$. In order to reestablish balance we have to increase the denominator $p(x \to y)$, which means we want to accept the proposed transition. Conversely, the second case translates to $p(y \to x) < p(x \to y)$ and we want to reject the proposed transition (proportionally to how bad we are out of balance) to avoid amplifying the imbalance.

References