Difficulty of Predicting the Maximum of Gaussians
Published:
Suppose that we have a random variable $X \in \mathbb{R}^d$, such that $\mathbb{E}[XX^{\top}] = I_{d \times d}$. Now take k independent Gaussian random variables $Z_1, \ldots, Z_k \sim \mathcal{N}(0, I_{d \times d})$, and let J be the argmax (over j in 1, …, k) of $Z_j^{\top}X$.
It seems that it should be very hard to predict J well, in the following sense: for any function $q(j \mid x)$, the expectation of $\mathbb{E}_{x}[q(J \mid x)]$, should with high probability be very close to $\frac{1}{k}$ (where the second probability is taken over the randomness in $Z$). In fact, Alex Zhai and I think that the probability of the expectation exceeding $\frac{1}{k}$ should be at most $\exp(-C(\epsilon/k)^2d)$ for some constant C. (We can already show this to be true where we replace $(\epsilon/k)^2$ with $(\epsilon/k)^4$.) I will not sketch a proof here but the idea is pretty cool, it basically uses Lipschitz concentration of Gaussian random variables.
I’m mainly posting this problem because I think it’s pretty interesting, in case anyone else is inspired to work on it. It is closely related to the covering number of exponential families under the KL divergence, where we are interested in coverings at relatively large radii ($\log(k) - \epsilon$ rather than $\epsilon$).