<< Chapter < Page | Chapter >> Page > |
Consider $x\in {\alpha}^{n}$ . The goal of lossy compression [link] is to describe $\stackrel{\u02c6}{x}$ , also of length $n$ but possibly defined over another reconstruction alphabet $\stackrel{\u02c6}{\alpha}\ne \alpha $ , such that the description requires few bits and the distortion
is small, where $d(\xb7,\xb7)$ is some distortion metric. It is well known that for every $d(\xb7,\xb7)$ and distortion level $D$ there is a minimum rate $R\left(D\right)$ , such that $\stackrel{\u02c6}{x}$ can be described at rate $R\left(D\right)$ . The rate $R\left(D\right)$ is known as the rate distortion (RD) function, it is the fundamental information theoretic limit of lossycompression [link] , [link] .
The invention of lossy compression algorithms has been a challenging problem for decades. Despite numerous applicationssuch as image compression [link] , [link] , video compression [link] , and speech coding [link] , [link] , [link] , there is a significant gap between theory and practice, and these practical lossy compressorsdo not achieve the RD function. On the other hand, theoretical constructions that achieve the RD function are impractical.
A promising recent algorithm by Jalali and Weissman [link] is universal in the limit of infinite runtime. Its RD performance is reasonable even with modest runtime.The main idea is that the distortion version $\stackrel{\u02c6}{x}$ of the input $x$ can be computed as follows,
where $\beta <0$ is the slope at the particular point of interest in the RD function, and ${H}_{k}\left({w}^{n}\right)$ is the empirical conditional entropy of order $k$ ,
where ${u}^{k}$ is a context of order $k$ , and as before ${n}_{w}({u}^{k},a)$ is the number of times that the symbol $a$ appears following a context ${u}^{k}$ in ${w}^{n}$ . Jalali and Weissman proved [link] that when $k=o(log(n\left)\right)$ , the RD pair $({H}_{k}\left({\stackrel{\u02c6}{x}}^{n}\right),\overline{d}({x}^{n},{\stackrel{\u02c6}{x}}^{n}))$ converges to the RD function asymptotically in $n$ . Therefore, an excellent lossy compression technique is to compute $\stackrel{\u02c6}{x}$ and then compress it. Moreover, this compression can be universal. In particular, the choice of context order $k=o(log(n\left)\right)$ ensures that universal compressors for context tress sources can emulate the coding length of the empirical conditional entropy ${\stackrel{\u02c6}{H}}_{k}\left({\stackrel{\u02c6}{x}}^{n}\right)$ .
Despite this excellent potential performance, there is still a tremendous challenge. Brute force computation of the globally minimum energysolution $\stackrel{\u02c6}{{x}^{n}}$ involves an exhaustive search over exponentially many sequences and is thus infeasible.Therefore, Jalali and Weissman rely on Markov chain Monte Carlo (MCMC) [link] , which is a stochastic relaxation approach to optimization. The crux of the matter is to definean energy function,
The Boltzmann probability mass function (pmf) is
where $t>0$ is related to temperature in simulated annealing, and ${Z}_{t}$ is the normalization constant, which does not need to be computed.
Because it is difficult to sample from the Boltzmann pmf [link] directly, we instead use a Gibbs sampler , which computes the marginal distributions at all $n$ locations conditioned on the rest of ${w}^{n}$ being kept fixed. For each location, the Gibbs sampler resamples from the distribution of ${w}_{i}$ conditioned on ${w}^{n\setminus i}\triangleq \{{w}_{n}:\phantom{\rule{4pt}{0ex}}n\ne i\}$ as induced by the joint pmf in [link] , which is computed as follows,
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?