<< Chapter < Page | Chapter >> Page > |
Suppose $f$ is piecewise Lipschitz and ${f}_{k}$ ia a piecewise constant.
where $\Delta $ is a constant equal to average of $f$ on right and left side of discontinuity in this interval.
where ${k}^{-1}$ is the width of the interval. Notice this rate is quite slow.
This problem naturally suggests the following remedy: use very small intervals near discontinuities and larger intervals insmooth regions. Specifically, suppose we use intervals of width ${k}^{-2\alpha}$ to contain the discontinuities and the intervals ofwidth ${k}^{-1}$ elsewhere. Then accordingly piecewise polynomial approximation ${\tilde{f}}_{k}$ satisfies
We can accomplish this need for "adaptive resolution" or "multiresolution" using recursive partitions and trees.
We discussed this idea already in our examination of classification trees. Here is the basic idea again, graphically.
Consider a function $f\in {B}^{\alpha}\left({C}_{\alpha}\right)$ that contains no more than m points of discontinuity, and is ${H}^{\alpha}\left({C}_{\alpha}\right)$ away from these points.
LemmaConsider a complete RDP with n intervals, then there exists anassociated pruned RDP with $O\left(klogn\right)$ intervals, such that an associated piecewise degree $\lceil \alpha \rceil $ polynomial approximation $\tilde{(}{f)}_{k}$ , has a squared approximation error of $O\left(min({k}^{-2\alpha},{n}^{-1})\right)$ .
Assume $n>k>m$ . Divide $[0,1]$ into $k$ intervals. If $f$ is smooth on a particular interval $I$ , then
In intervals that contain a discontinuity, recursively subdivide into two until the discontinuity is contained in an interval ofwidth ${n}^{-1}$ . This process results in at most $lo{g}_{2}n$ addition subintervals per discontinuity, and the squared approximationerror is $O\left(k-2\alpha \right)$ on all of them accept the $m$ intervals of width ${n}^{-1}$ containing the discontinuities where the error is $O\left(1\right)$ at each point.
Thus, the overall squared ${L}_{2}$ norm is
and there are at most $k+lo{g}_{2}n$ intervals in the partition. Since k>m, we can upperbound the number of intervals by $2klo{g}_{2}n$ .
Note that if the initial complete RDP has $n\approx {k}^{2\alpha}$ intervals, then the squared error is $O\left({k}^{-2\alpha}\right)$ .
Thus, we only incur a factor of $2\alpha logk$ additional leafs and achieve the same overall approximation error as in the ${H}^{\alpha}\left({C}_{\alpha}\right)$ case. We will see that this is a small price to pay in order to handle not only smooth functions, but alsopiecewise smooth functions.
Let $f\in {L}^{2}\left([0,1]\right)$ ; $\int {f}^{2}\left(t\right)dt<\infty $ .
A wavelet approximation is a series of the form
where ${c}_{o}$ is a constant $({c}_{o}={\int}_{0}^{1}f\left(t\right)dt)$ ,
and the basis functions ${\psi}_{j,k}$ are orthonormal, oscillatory signals, each with an associated scale ${2}^{-j}$ and position $k{2}^{-j}$ . ${\psi}_{j,k}$ is called the wavelet at scale ${2}^{-j}$ and position $k{2}^{-j}$ .
Suppose $f$ is piecewise constant with at most $m$ discontinuities. Let
Then, ${f}_{J}$ has at most $mJ$ non-zero wavelet coefficients; i.e., $<f,{\psi}_{j,k}>=0$ for all but $mJ$ terms, since at most one Haar Wavelet at each scale senses each point of discontinuity. Said another way, allbut at most $m$ of the wavelets at each scale have support over constant regions of $f$ .
${f}_{J}$ itself will be piecewise constant with discontinuities only possible occurring at end points of the intervals $[{2}^{-J}(k-1),{2}^{-J}k]$ . Therefore, in this case
Daubechies wavelets are the extension of the Haar wavelet idea. Haar wavelets have one "vanishing moment":
Daubechies wavelets are "smoother" basis functions with extra vanishing moments. The Daubechies- $N$ wavelet has $N$ vanishing moments.
The Daubechies-1 wavelet is just the Haar case.
If $f$ is a piecewise degree $\le N$ polynomial with at most m pieces, then using the Daubechies- $N$ wavelet system.
and
has at most $O\left(mJ\right)$ non-zero wavelet coefficients. ${f}_{J}$ is called the Discrete Wavelet Transform (DWT) approximation of $f$ . The key idea is the same as we saw with trees.
We can also use DWT's to analyze and represent discrete, sampled functions. Suppose,
then we can write $\underline{f}$ as
where
is a discrete time analog of the continuous time wavelets we considered before. In particular,
for the Daubechies- $N$ discrete wavelets.
Thus, we also have an analogous approximation result: If $\underline{f}$ are samples from a piecewise degree $\le N$ polynomial function with a finite number $m$ of discontinuities, then $\underline{f}$ has $O\left(mJ\right)$ non-zero wavelet coefficients.
Suppose $f\in {B}^{\alpha}\left({C}_{\alpha}\right)$ and has a finite number of discontinuities. Let ${f}_{p}$ denote piecewise degree- $N(N=\lceil \alpha \rceil )$ polynomial approximation to $f$ with $O\left(k\right)$ pieces; a uniform partition into $k$ equal length intervals followed by addition splits at the points of discontinuity.
Then
and ${\underline{f}}_{p}$ has $O\left(klo{g}_{2}n\right)$ non-zero coefficients according to our previous analysis.
Suppose $f$ is a 2-D image that is piecewise polynomial:
A pruned RDP of $k$ squares decorated with polyfits gives
Let $\underline{f}{=[f(i/k,j/k)}_{i,j=1}^{n}$ sample range.
then
$O\left(1\right)$ error on $k$ of the ${k}^{2}$ pixels, near zero elsewhere. The DWT of $\underline{f}$ has $O\left(k\right)$ non-zero wavelet coefficients. $O\left({2}^{j}\right)$ at scale ${2}^{-j},j=0,1,...,logn.$
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?