<< Chapter < Page Chapter >> Page >
θ M L = arg max { θ n x ( 1 ) ( 1 - θ ) n x ( 0 ) } ,

and plugging this parameter θ = θ M L into p θ ( x ) minimizes the coding length among all possible parameters, θ Λ . It is readily seen that

θ M L = n x ( 1 ) n .

Suppose, however, that we were to encode with θ ' = θ M L + Δ . Then the coding length would be

l θ ( x ) = - log ( θ ' ) n x ( 1 ) ( 1 - θ ' ) n x ( 0 ) .

It can be shown that this coding length is suboptimal w.r.t. l θ M L ( x ) by n · O ( Δ 2 ) bits. Keep in mind that doubling the number of parameter levels used by ouruniversal encoder requires an extra bit to encode the extra factor of 2 in resolution. It makes sense to expend this extra bit only if it buys us at least one other bit,meaning that n · O ( Δ 2 ) = 1 , which implies that we encode θ M L to a resolution of 1 / n , corresponding to O ( n ) levels. Again, this is a redundancy of 1 2 log ( n ) bits per parameter.

Having described Rissanen's result intuitively, let us formalize matters. Consider { p θ , θ Λ } , where Λ R K is a compact set. Suppose that there exists an estimator θ ^ such that

n n ( c ) : p θ θ ^ ( x n ) - θ > c n δ ( c ) ,

where lim c δ ( c ) = 0 . Then we have the following converse result.

Theorem 6 (Converse to universal coding  [link] ) Given a parametric class that satisfies the above condition [link] , for all ϵ > 0 and all codes l that do not know θ ,

r n ( l , θ ) ( 1 - ϵ ) K 2 log ( n ) n ,

except for a class of θ in B ϵ ( n ) Λ whose Lebesgue volume shrinks to zero as n increases.

That is, a universal code cannot compress at a redundancy substantialy below 1 2 log ( n ) bits per parameter. Rissanen also proved the following achievable result in his seminal paper.

Theorem 7 (Achievable to universal coding  [link] ) If p θ ( x ) is twice differentiable in θ for every x n , then there exists a universal code such that θ Λ : r n ( l , θ ) ( 1 + ϵ ) K 2 log ( n ) n .

Universal coding for piecewise i.i.d. sources

We have emphasized stationary parametric classes, but a parametric class can be nonstationary. Let us show how universal coding can be achieved for some nonstationary classes of sourcesby providing an example. Consider Λ = { 0 , 1 , . . . , n } where

p θ ( x n ) = Q 1 ( x 1 θ ) · Q 2 ( x θ + 1 n ) ,

where Q 1 and Q 2 are both know i.i.d. sources. This is a piecewise i.i.d. source ; in each segment it is i.i.d., and there is an abrupt transition in statistics when the first segment ends and the second begins.

Here are two approaches to coding this source.

  1. Encode the best index θ M L using log ( n + 1 ) bits, then encode p θ M L ( x n ) . This is known as two-part code or plug-in ; after encoding the index, we plug the best parameter into the distribution. Clearly,
    l ( x ) = min 0 θ n - log p θ ( x ) + log ( n + 1 ) - log p θ ( x ) + log ( n + 1 ) + 2 .
  2. The second approach is a mixture , we allocate weights for all possible parameters,
    l ( x ) = - log 1 n + 1 i = 0 n p i ( x n ) < - log 1 n + 1 p θ M L ( x n ) = - log ( p θ M L ( x ) ) + log ( n + 1 ) .

Merhav  [link] provided redundancy theorems for this class of sources. Algorithmic approaches to the mixture appear in Shamir and Merhav  [link] and Willems  [link] .

The theme that is common to both approaches, the plug-in and the mixture, is that they lose approximately log ( n ) bits in encoding the location of the transition. Indeed, Merhav showed that the penalty for each transition in universal codingis approximately log ( n ) bits  [link] . Intuitively, the reason that the redundancy required to encode the location of the transition is largerthan the 1 2 log ( n ) from Rissanen  [link] is because the location of the transitionmust be described precisely to prevent paying a big coding length penalty in encoding segments using the wrong i.i.d. statistics. In contrast, in encoding our Bernoulli example an imprecision of 1 n in encoding θ M L in the first part of the code yields only an O ( 1 ) bit penalty in the second part of the code.

It is well known that mixtures out-compress the plug-in. However, in many cases they do so by only a small amount per parameter. For example, Baron et al. showed that the plug-in for i.i.d. sourcesloses approximately 1 bit per parameter w.r.t. the mixture.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Universal algorithms in signal processing and communications. OpenStax CNX. May 16, 2013 Download for free at http://cnx.org/content/col11524/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?

Ask