Maximal Maximum-Entropy Sets

2 minute read

Published:

Consider a probability distribution ${p(y)}$ on a space ${\mathcal{Y}}$. Suppose we want to construct a set ${\mathcal{P}}$ of probability distributions on ${\mathcal{Y}}$ such that ${p(y)}$ is the maximum-entropy distribution over ${\mathcal{P}}$:

$\displaystyle H(p) = \max_{q \in \mathcal{P}} H(q), $

where ${H(p) = \mathbb{E}_{p}[-\log p(y)]}$ is the entropy. We call such a set a maximum-entropy set for ${p}$. Furthermore, we would like ${\mathcal{P}}$ to be as large as possible, subject to the constraint that ${\mathcal{P}}$ is convex.

Does such a maximal convex maximum-entropy set ${\mathcal{P}}$ exist? That is, is there some convex set ${\mathcal{P}}$ such that ${p}$ is the maximum-entropy distribution in ${\mathcal{P}}$, and for any ${\mathcal{Q}}$ satisfying the same property, ${\mathcal{Q} \subseteq \mathcal{P}}$? It turns out that the answer is yes, and there is even a simple characterization of ${\mathcal{P}}$:

Proposition 1 For any distribution ${p}$ on ${\mathcal{Y}}$, the set

$\displaystyle \mathcal{P} = \{q \mid \mathbb{E}_{q}[-\log p(y)] \leq H(p)\} $

is the maximal convex maximum-entropy set for ${p}$.

To see why this is, first note that, clearly, ${p \in \mathcal{P}}$, and for any ${q \in \mathcal{P}}$ we have

$\displaystyle \begin{array}{rcl} H(q) &=& \mathbb{E}_{q}[-\log q(y)] \\ &\leq& \mathbb{E}_{q}[-\log p(y)] \\ &\leq& H(p), \end{array} $

so ${p}$ is indeed the maximum-entropy distribution in ${\mathcal{P}}$. On the other hand, let ${\mathcal{Q}}$ be any other convex set whose maximum-entropy distribution is ${p}$. Then in particular, for any ${q \in \mathcal{Q}}$, we must have ${H((1-\epsilon)p + \epsilon q) \leq H(p)}$. Let us suppose for the sake of contradiction that ${q \not\in \mathcal{P}}$, so that ${\mathbb{E}_{q}[-\log p(y)] > H(p)}$. Then we have

$\displaystyle \begin{array}{rcl} H((1-\epsilon)p + \epsilon q) &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log((1-\epsilon)p(y)+\epsilon q(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log(p(y) + \epsilon (q(y)-p(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[-\log(p(y)) - \epsilon \frac{q(y)-p(y)}{p(y)} + \mathcal{O}(\epsilon^2)\right] \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon^2 \mathbb{E}_{q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) + \mathcal{O}(\epsilon^2). \end{array} $

Since ${\mathbb{E}_{q}[-\log p(y)] - H(p) > 0}$, for sufficiently small ${\epsilon}$ this will exceed ${H(p)}$, which is a contradiction. Therefore we must have ${q \in \mathcal{P}}$ for all ${q \in \mathcal{Q}}$, and hence ${\mathcal{Q} \subseteq \mathcal{P}}$, so that ${\mathcal{P}}$ is indeed the maximal convex maximum-entropy set for ${p}$.