1.12 The hilbert space of random variables

Signal theory Page 1 / 1

Describes random variables in terms of Hilbert spaces, defining inner products, norms, and minimum mean square error estimation.

Random variable spaces

Probability – notation primer

Definition 1 A random variable x is defined by a distribution function

P (x) = F_{X} (x) = Prob (X \leq x)

The density function is given by

\frac{\partial P (x)}{\partial x} = f_{X} (x) = \frac{\partial Prob (X \leq x)}{\partial x}

Definition 2 The expectation of a function $g (x)$ over the random variable $x$ is

E_{X} [g (x)] = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x

Definition 3 Pairs of random variables $X, Y$ are defined by the joint distribution function

P (x, y) = F_{X Y} (x, y) = Prob (X \leq x, Y \leq y)

The joint density function is given by

\frac{\partial^{2} P (x, y)}{\partial x \partial y} = f_{X Y} (x, y) = \frac{\partial^{2} Prob (X \leq x, Y \leq y)}{\partial x \partial y}

The expectation of a function $g (x, y)$ is given by

E_{X, Y} [g (x, y)] = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} g (x, y) f_{X Y} (x, y) d x d y

A hilbert space of random variables

Definition 4 Let ${Y_{1}, \dots, Y_{n}}$ be a collection of zero-mean ( $E [Y_{i}] = 0$ ) random variables. The space $H$ of all random variables that are linear combinations of those $n$ random variables ${y_{1}, \dots, y_{n}}$ is a Hilbert space with inner product

⟨ X, Y ⟩ = E [x \bar{y}] = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} x \bar{y} f_{X Y} (x, y) d x d y .

We can easily check that this is a valid inner product:

$⟨ x, x ⟩ = E [x \bar{x}] = \int_{- \infty}^{\infty} {| x |}^{2} f_{x} (x) d x = E [{| x |}^{2}] \geq 0$ ;
$⟨ x, x ⟩ = 0$ if and only if $f_{X} (x) = δ (x)$ , i.e., if $X$ is a random variable that is deterministically zero (and this random variable is the “zero” of this Hilbert space);
$⟨ x, y ⟩ = \bar{⟨ y, x ⟩}$ ;
$⟨ x + y, z ⟩ = E [(x + y) \bar{z}] = E [x \bar{z}] + E [y \bar{z}] = ⟨ x, z ⟩ + ⟨ y, z ⟩;$

Note in particular that orthogonality, i.e., $⟨ x, y ⟩ = 0$ , implies $E [x \bar{y}] = 0$ , i.e., $x$ and $y$ are independent random variables. Additionally, the induced norm $∥ X ∥ = \sqrt{⟨ X, X ⟩} = \sqrt{E [| x |^{2}]}$ is the standard deviation of the zero-mean random variable $X$ .

A hilbert space of random vectors

One can define random vectors $X$ , $Y$ whose entries are random variables:

X = [\begin{matrix} X_{1} \\ ⋮ \\ X_{N} \end{matrix}], Y = [\begin{matrix} Y_{1} \\ ⋮ \\ Y_{N} \end{matrix}] .

For these, the following inner product is an extension of that given above:

⟨ X, Y ⟩ = E [y^{H} x] = E [\sum_{i = 1}^{n}, \bar{y_{i}}, x_{i}] = E [trace [x y^{H}]] .

The induced norm is

∥ X ∥ = \sqrt{⟨ X, X ⟩} = E [\sqrt{x^{H} x}] = E [\sqrt{\sum_{i = 1}^{N} {| x_{i} |}^{2}}],

the expected norm of the vector $x$ .

Minimum mean square error estimation

In an MMSE estimation problem, we consider $Y = A X + N$ , where $X, Y$ are two random vectors and $N$ is usually additive white Gaussian noise ( $Y$ is $m \times 1$ , $A$ is $m \times n$ , X is $n \times 1$ , and $N \sim N (0, σ^{2} I)$ is $m \times 1$ ). Due to this noise model, we want an estimate $\hat{X}$ of $X$ that minimizes $E [∥ X -, \hat{X}, ∥^{2}]$ ; such an estimate has highest likelihood under an additive white Gaussian noise model. For computational simplicity, we often want to restrict the estimator to be linear, i.e.

\hat{X} = K Y = [\begin{matrix} K_{1}^{H} \\ ⋮ \\ K_{n}^{H} \end{matrix}] Y,

where $K_{i}^{H}$ denotes the $i^{t h}$ row of the estimation matrix $K$ and ${\hat{X}}_{i} = K_{i}^{H} Y$ . We use the definition of the $ℓ_{2}$ norm to simplify the equation:

min_{K} E [∥ X -, \hat{X}, ∥_{2}^{2}] = min_{K} E [{∥ X - K Y ∥}_{2}^{2}] = min_{K} E [\sum_{i = 1}^{n}, {(X_{i} - K_{i}^{H} Y)}^{2}]

Since the terms involved in the sum are independent from each other and nonnegative, this minimization can be posed in terms of $n$ individual minimizations: for $i = 1, 2, . . ., n$ , we solve

min_{K_{i}} E [{(X_{i} - K_{i}^{H} Y)}^{2}] = min_{K_{i}} E [{(X_{i} - \sum_{i = 1}^{n} \bar{K_{i j}} Y_{j})}^{2}] = min_{K_{i}} ∥X_{i} - \sum_{i = 1}^{n} \bar{K_{i j}} Y_{j}∥,

where the norm is the induced norm for the Hilbert space of random variables. Note at this point that the set of random variables $\sum_{i = 1}^{n} \bar{K_{i j}} Y_{j}$ over the choices of $K_{i}$ can be written as $span ({Y_{j}}_{j = 1}^{m})$ . Thus, the optimal $K_{i}$ is given by the coefficients of the closest point in $span ({Y_{j}}_{j = 1}^{m})$ to the random variable $X_{i}$ according to the induced norm for the Hilbert space of random variables. Therefore, we solve for $K_{i}$ using results from the projection theorem with the corresponding inner product. Recall that given a basis $Y_{i}$ for the subspace of interest, we obtain the equation $β_{i} = G {(K_{i}^{H})}^{T} = G \bar{K_{i}}$ , where $β_{i, j} = 〈X_{i}, Y_{j}〉$ and $G$ is the Gramian matrix. More specifically, we have

\underset{β_{i}}{\underset{︸}{[\begin{matrix} 〈X_{i}, Y_{1}〉 \\ 〈X_{i}, Y_{2}〉 \\ ⋮ \\ 〈X_{i}, Y_{m}〉 \end{matrix}]}} = \underset{G}{\underset{︸}{[\begin{matrix} 〈Y_{1}, Y_{1}〉 & 〈Y_{2}, Y_{1}〉 & \dots & 〈Y_{m}, Y_{1}〉 \\ 〈Y_{1}, Y_{2}〉 & 〈Y_{2}, Y_{2}〉 & \dots & 〈Y_{m}, Y_{2}〉 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 〈Y_{1}, Y_{m}〉 & 〈Y_{2}, Y_{m}〉 & \dots & 〈Y_{m}, Y_{m}〉 \end{matrix}]}} \underset{K_{i}}{\underset{︸}{[\begin{matrix} \bar{K_{i 1}} \\ \bar{K_{i 2}} \\ ⋮ \\ \bar{K_{i m}} \end{matrix}]}} .

Thus, one can solve for $\bar{K_{i}} = G^{- 1} β_{i}$ . In the Hilbert space of random variables, we have

\begin{matrix} G & = [\begin{matrix} E [Y_{1} Y_{1}] & E [Y_{2} Y_{1}] & \dots & E [Y_{m} Y_{1}] \\ E [Y_{1} Y_{2}] & E [Y_{2} Y_{2}] & \dots & E [Y_{m} Y_{2}] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E [Y_{1} Y_{m}] & E [Y_{2} Y_{m}] & \dots & E [Y_{m} Y_{m}] \end{matrix}] = R_{Y}, \\ β & = [\begin{matrix} E [X_{i} Y_{1}] \\ E [X_{i} Y_{2}] \\ ⋮ \\ E [X_{i} Y_{m}] \end{matrix}] = ρ_{X_{i} Y} . \end{matrix}

Here $R_{Y}$ is the correlation matrix of the random vector $Y$ and $ρ_{X_{i} Y}$ is the cross-correlation vector of the random variable $X_{i}$ and vector $Y$ . Thus, we have $\bar{K_{i}} = G^{- 1} β_{i} = R_{Y}^{- 1} ρ_{X_{i} Y}$ , and so $K_{i}^{H} = ρ_{X_{i} Y}^{T} R_{Y}^{- 1}$ . Concatenating all the rows of $K$ together, we get $K = R_{X, Y} R_{Y}^{- 1}$ , where $R_{X, Y}$ is the cross-correlation matrix for the random vectors $X$ and $Y$ . We therefore obtain the optimal linear estimator $\hat{X} = K Y = R_{X, Y} R_{Y}^{- 1} Y$ .

At first, there may be some confusion on the difference between least squares and minimum mean-square error. To summarize:

Least Squares are applied when the quantities observed are deterministic (i.e., a “single draw” of data or observations).
Minimum Mean Square Error Estimation are applied when random variables are observed under Gaussian noise; one must know a distribution over inputs, and the error must be measured in expectation.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Signal theory. OpenStax CNX. Oct 18, 2013 Download for free at http://legacy.cnx.org/content/col11542/1.3

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Signal theory' conversation and receive update notifications?

Ask

	6 Arts Society: Theater 6 By Jonathan Long Start Quiz
	Classical Music By Marion Cabalfin Start Quiz
	1 SCJP/OCJP Java Certification By JavaChamp Team Start Exam
	7 Physiotherapy Flashcards Set 7 By Rhodes Start Flashcards
	15 Dr. Amberg Pharm quiz By Brooke Delaney Start Exam
	12 Dr Dowers Endocrinology By Brooke Delaney Start Exam
	1 Gastrointestinal Pathophysiology By Laurence Bailen Start Exam
	1 Pharmacology Nervous System MCQ By Rohini Ajay Start Quiz
	2 Biology 02 The Chemical Foundation of Life MCQ By OpenStax Start Quiz
	Lean Startup Quiz By Yasser Ibrahim Start Quiz