How to understand the distribution of a random variable constrained on a linear system of equations?
I am trying to understand the details in the proof of Lemma 12 of this paper in the field of compressed sensing.
The lemma considers a random vector $z$ with entries sampled i.i.d from $\mathcal{N}(0, \sigma^2)$, conditioned on the linear system of equations $\boldsymbol{D} z = \boldsymbol{b}$ with $\boldsymbol{D} \in \mathbb{R}^{m \times n}$, a fixed matrix, and $\boldsymbol{b} \in \operatorname{col}(\boldsymbol{D}) \subseteq \mathbb{R}^m$, a fixed vector in the column space of $\boldsymbol{D}$ (to assure the system has at least one solution). Now, it should be "well known" that $$z|_{\boldsymbol{D} z = \boldsymbol{b}} \overset{\mathrm{d}}{=} \boldsymbol{D}^\operatorname{+} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}) \tilde{z},$$ where $\tilde{z}$ is a (new) random vector with i.i.d. entries from $\mathcal{N}(0, \sigma^2)$, $\boldsymbol{I} \in \{0, 1\}^{n \times n}$, $\boldsymbol{D}^\operatorname{+}$ is the Moore-Penrose pseudo-inverse and $x \overset{d}{=} y$ means equality in distribution.
Now, the proof states that this is trivial for $\boldsymbol{D} = \begin{bmatrix}\boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0}\end{bmatrix}$. I can follow this argument, although I am not sure if I understand why $\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}$, i.e., the orthogonal projection onto the null-space of $\boldsymbol{D}$, is the only/right way to account for the fact that some entries in $z$ must be ignored. However, my main issue lies in the subsequent argument that the general case holds because of the rotational invariance of the Gaussian distribution. I do not see how this is the case.
I do understand that using the singular value decomposition of $\boldsymbol{D} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}$, the system can be converted to a diagonal system, but I do not understand how the rotational invariance of the Gaussian distribution comes into play. After all, using some linear algebra and starting from the diagonal case, my reasoning would go as follows: $$\begin{align} z|_{\boldsymbol{\Sigma} z = \boldsymbol{b}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{\Sigma}) \tilde{z} \\ z|_{\boldsymbol{\Sigma} z = \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{b}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \tilde{z} \\ z|_{\boldsymbol{\Sigma} z = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \tilde{z} \tag{$\mathring{\boldsymbol{b}} = \boldsymbol{U} \boldsymbol{b}$} \\ \boldsymbol{V} z|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \boldsymbol{V} z = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + \boldsymbol{V} (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \boldsymbol{V}^\mathsf{T} \boldsymbol{V} \tilde{z} \\ \mathring{z}|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \mathring{z} = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}) \boldsymbol{V} \tilde{z} \tag{$\mathring{\boldsymbol{z}} = \boldsymbol{V} \boldsymbol{z}$} \\ \mathring{z}|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \mathring{z} = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{D}^\operatorname{+} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}) \boldsymbol{V} \tilde{z}. \\ \hphantom{=} \end{align}$$ Now, I do not see why rotational invariance of the distribution of $z$ and $\tilde{z}$ would be relevant to conclude that $\tilde{\mathring{z}} = \boldsymbol{V} \tilde{z}$ and multiply the constraint to the left with $\boldsymbol{U}$ to obtain $\boldsymbol{D} \mathring{z} = \mathring{\boldsymbol{b}}$, obtaining the desired result. I realise that we are rotating $\tilde{z}$ and $z$ with different matrices, but I do not see why that would matter.
Therefore, my question is:
Why is rotational invariance (of the distribution of the entries of $z$) relevant to this proof?
Bonus points if you can also explain where the orthogonal projection comes from :)
0 comment threads