Maxent as KL minimisation with improper priors

27 Jan 2020

Very simple identity that’s worth making a quick note of.

Let’s assume we’ve got some regularised expected utility problem in the following form (c.f., ELBO):

\[\begin{align} \max_{q(\theta)}[\mathbb{E}_{q(\theta)}[J(\theta)] - \alpha D_{KL}[q(\theta)||q_0(\theta)] ]. \end{align}\]

We write-out the KL regularisation:

\[\begin{align} D_{KL}[q(\theta)||q_0(\theta)] &= \int q(\theta) \log \frac{q(\theta)}{q_0(\theta)} \mathrm{d}\theta\\ &= - \int q(\theta) \log q_0(\theta) \mathrm{d}\theta + \int q(\theta) \log q(\theta) \mathrm{d}\theta\\ &= H(q,q_0) - H(q). \end{align}\]

Now let’s specify \(q_0\) as an improper prior (i.e., one with infinitesimal mass over all \(\theta\) such that it sums to 1). We observe this causes the \(H(q,q_0)\) cross entropy to not vary with \(\theta\). Therefore we can rewrite the initial maximisation term:

\[\begin{align} \max_{q(\theta)}[\mathbb{E}_{q(\theta)}[J(\theta)] + \alpha H(q) ]. \end{align}\]

Therefore we observe that requiring the regularisation of \(q\) to be the KL-divergence against an improper prior, we retrieve a maximum entropy objective.

Philip J. Ball

Research Scientist @ Google DeepMind

Maxent as KL minimisation with improper priors