How to understand the distribution of a random variable constrained on a linear system of equations?

The lemma considers a random vector $z$ with entries sampled i.i.d from $\mathcal{N}(0, \sigma^2)$, conditioned on the linear system of equations $\boldsymbol{D} z = \boldsymbol{b}$ with $\boldsymbol{D} \in \mathbb{R}^{m \times n}$, a fixed matrix, and $\boldsymbol{b} \in \operatorname{col}(\boldsymbol{D}) \subseteq \mathbb{R}^m$, a fixed vector in the column space of $\boldsymbol{D}$ (to assure the system has at least one solution). Now, it should be "well known" that $$z|_{\boldsymbol{D} z = \boldsymbol{b}} \overset{\mathrm{d}}{=} \boldsymbol{D}^\operatorname{+} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}) \tilde{z},$$ where $\tilde{z}$ is a (new) random vector with i.i.d. entries from $\mathcal{N}(0, \sigma^2)$, $\boldsymbol{I} \in \{0, 1\}^{n \times n}$, $\boldsymbol{D}^\operatorname{+}$ is the Moore-Penrose pseudo-inverse and $x \overset{d}{=} y$ means equality in distribution.

Now, the proof states that this is trivial for $\boldsymbol{D} = \begin{bmatrix}\boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0}\end{bmatrix}$. I can follow this argument, although I am not sure if I understand why $\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}$, i.e., the orthogonal projection onto the null-space of $\boldsymbol{D}$, is the only/right way to account for the fact that some entries in $z$ must be ignored. However, my main issue lies in the subsequent argument that the general case holds because of the rotational invariance of the Gaussian distribution. I do not see how this is the case.

I do understand that using the singular value decomposition of $\boldsymbol{D} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}$, the system can be converted to a diagonal system, but I do not understand how the rotational invariance of the Gaussian distribution comes into play. After all, using some linear algebra and starting from the diagonal case, my reasoning would go as follows: $$\begin{align} z|_{\boldsymbol{\Sigma} z = \boldsymbol{b}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{\Sigma}) \tilde{z} \\ z|_{\boldsymbol{\Sigma} z = \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{b}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{b} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \tilde{z} \\ z|_{\boldsymbol{\Sigma} z = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \tilde{z} \tag{$\mathring{\boldsymbol{b}} = \boldsymbol{U} \boldsymbol{b}$} \\ \boldsymbol{V} z|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \boldsymbol{V} z = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + \boldsymbol{V} (\boldsymbol{I} - \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma}) \boldsymbol{V}^\mathsf{T} \boldsymbol{V} \tilde{z} \\ \mathring{z}|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \mathring{z} = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{V} \boldsymbol{\Sigma}^\operatorname{+} \boldsymbol{U}^\mathsf{T} \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}) \boldsymbol{V} \tilde{z} \tag{$\mathring{\boldsymbol{z}} = \boldsymbol{V} \boldsymbol{z}$} \\ \mathring{z}|_{\boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T} \mathring{z} = \boldsymbol{U}^\mathsf{T} \mathring{\boldsymbol{b}}} &\overset{\mathrm{d}}{=} \boldsymbol{D}^\operatorname{+} \mathring{\boldsymbol{b}} + (\boldsymbol{I} - \boldsymbol{D}^\operatorname{+} \boldsymbol{D}) \boldsymbol{V} \tilde{z}. \\ \hphantom{=} \end{align}$$ Now, I do not see why rotational invariance of the distribution of $z$ and $\tilde{z}$ would be relevant to conclude that $\tilde{\mathring{z}} = \boldsymbol{V} \tilde{z}$ and multiply the constraint to the left with $\boldsymbol{U}$ to obtain $\boldsymbol{D} \mathring{z} = \mathring{\boldsymbol{b}}$, obtaining the desired result. I realise that we are rotating $\tilde{z}$ and $z$ with different matrices, but I do not see why that would matter.

Bonus points if you can also explain where the orthogonal projection comes from :)

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

Communities

How to understand the distribution of a random variable constrained on a linear system of equations?

0 comment threads

0 answers