<< Chapter < Page | Chapter >> Page > |
We define our estimator as
where
and
Therefore ${\widehat{f}}_{n}^{H}$ minimizes
over all $f\in {\mathcal{F}}^{H}$ . We showed before that
To proceed with our analysis we need to make some assumptions on the intrinsic difficulty of the problem. We will assume that theBayes decision boundary is a “well-behaved” 1-dimensional set, in the sense that it has box-counting dimension one (seeAppendix "Box Counting Dimension" ). This implies that, for an histogram with ${k}^{2}$ bins, the Bayes decision boundary intersects less than $Ck$ bins, where $C$ is a constant that does not depend on $k$ . Furthermore we assume that the marginal distribution of $X$ satisfies ${P}_{X}\left(A\right)\le K\left|A\right|$ , for any measurable subset $A\subseteq {[0,1]}^{2}$ . This means that the samples collected do not accumulate anywhere in the unit square.
Under the above assumptions we can conclude that
Therefore
We can balance the terms in the right side of the above expression using $k={n}^{1/4}$ (for $n$ large) therefore
Now let's consider the dyadic decision trees, under the assumptions above, and contrast these with the histogram classifier. Let
Let ${\mathcal{F}}^{T}={\bigcup}_{k\ge 1}{\mathcal{F}}_{k}^{T}$ . We can prefix encode each element $f$ of ${\mathcal{F}}^{T}$ with ${c}_{T}\left(f\right)=3k-1$ bits, as described before.
Let
where
and
Hence ${\widehat{f}}_{n}^{T}$ minimizes
over all $f\in {\mathcal{F}}^{T}$ . Moreover
If the Bayes decision boundary is a 1-dimensional set, as in "Histogram Risk Bound" , there exists a tree with at most $8Ck$ leafs such that the boundary is contained in at most $Ck$ squares, each of volume $1/{k}^{2}$ . To see this, start with a tree yielding the histogram partition with ${k}^{2}$ boxes ( i.e., the tree partitioning the unit square into ${k}^{2}$ equal sized squares). Now prune all the nodes that do not intersect the boundary. In [link] we illustrate the procedure. If you carefully bound the number of leafs you need at each level you canshow that you will have in total less than $8Ck$ leafs. We conclude then that there exists a tree with at most $8Ck$ leafs that has the same risk as a histogram with $O\left({k}^{2}\right)$ bins. Therefore, using [link] we have
We can balance the terms in the right side of the above expression using $k={n}^{1/3}$ (for $n$ large) therefore
Trees generally work much better than histogram classifiers. This is essentially because they provide much more efficient ways ofapproximating the Bayes decision boundary (as we saw in our example, under reasonable assumptions on the Bayes boundary, a tree encodedwith $O\left(k\right)$ bits can describe the same classifier as an histogram that requires $O\left({k}^{2}\right)$ bits).
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?