<< Chapter < Page | Chapter >> Page > |
We include some background material for the course. Let us recall some notions of convergence of random variables (RV's).
For example, for $p=2$ we have mean square convergence, ${x}_{n}\stackrel{m.s.}{\to}\overline{x}$ . For $p\ge 2$ ,
Therefore, ${x}_{n}\stackrel{{\ell}_{p}}{\to}\overline{x}$ yields ${x}_{n}\stackrel{{\ell}_{p-1}}{\to}\overline{x}$ . Note that for convergence in ${\ell}_{1}$ sense, we have
The following material appears in most textbooks on information theory (c.f., Cover and Thomas [link] and references therein). We include the highlights in orderto make these notes self contained, but skip some details and the proofs. Consider a sequence $x={x}^{n}=({x}_{1},{x}_{2},...,{x}_{n})$ , where ${x}_{i}\in \alpha $ , $\alpha $ is the alphabet, and the cardinality of $\alpha $ is $r$ , i.e., $\left|\alpha \right|=r$ .
Definition 1 The type of $x$ consists of the empirical probabilities of symbols in $x$ ,
where ${n}_{x}\left(a\right)$ is the empirical symbol count , which is the the number of times that $a\in \alpha $ appears in $x$ .
Definition 2 The set of all possible types is defined as ${P}_{n}$ .
For an alphabet $\alpha =\{0,1\}$ we have ${P}_{n}=\{(\frac{0}{n},\frac{n}{n}),(\frac{1}{n},\frac{n-1}{n}),...,(\frac{n}{n},\frac{0}{n})\}$ . In this case, $|{P}_{n}|=n+1$ .
Definition 3 A type class ${T}_{x}$ contains all ${x}^{\text{'}}\in {\alpha}^{n}$ , such that ${P}_{x}$ = ${P}_{{x}^{\text{'}}}$ ,
Consider $\alpha =1,2,3$ and $x=11321$ . We have $n=5$ and the empirical counts are ${n}_{x}=(3,1,1)$ . Therefore, the type is ${P}_{x}=(\frac{3}{5},\frac{1}{5},\frac{1}{5})$ , and the type class ${T}_{x}$ contains all length-5 sequences with 3 ones, 1 two, and 1 three. That is, ${T}_{x}=\{11123,11132,...,32111\}$ . It is easy to see that $|{T}_{x}|=\frac{5!}{3!1!1!}=20$ .
Theorem 1 The cardinality of the set of all types satisfies $|{P}_{n}{|\le (n+1)}^{r-1}$ .
The proof is simple, and was given in class. We note in passing that this bound is loose, but it is good enough for our discussion.
Next, consider an i.i.d. source with the following prior,
We note in passing that i.i.d. sources are sometimes called memoryless. Let the entropy be
where we use base-two logarithms throughout. We are studying the entropy $H\left({P}_{x}\right)$ in order to show that it is the fundamental performance limit in lossless compression. $\Sigma $ find me
We also define the divergence as
It is well known that the divergence is non-negative,
Moreover, $D(P\parallel Q)=0$ only if the distributions are identical.
Claim 1 The following relation holds,
The derivation is straightforward,
Seeing that the divergence is non-negative [link] , and zero only if the distributions are equal,we have $Q\left(x\right)\le {P}_{x}\left(x\right)$ . When ${P}_{x}=Q$ the divergence between them is zero, and we have that ${P}_{x}\left(x\right)={Q}_{x}={2}^{-nH\left({P}_{x}\right)}$ .
The proof of the following theorem was discussed in class.
Theorem 2 The cardinality of the type class $T\left({P}_{x}\right)$ obeys,
Having computed the probability of $x$ and cardinality of its type class, we can easily compute the probability of the type class.
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?