<< Chapter < Page | Chapter >> Page > |
Notice that we cannot form a density estimate by simply differentiating the empirical CDF, since this function contains discontinuities at thesample locations ${X}_{i}$ . Rather, we need to estimate the probability that a random variable willfall within a particular interval of the real axis. In this section, we will describe a common method known as the histogram .
Our goal is to estimate an arbitrary probability density function, ${f}_{X}\left(x\right)$ , within a finite region of the $x$ -axis. We will do this by partitioning the region into $L$ equally spaced subintervals, or “bins”,and forming an approximation for ${f}_{X}\left(x\right)$ within each bin. Let our region of support start at the value ${x}_{0}$ , and end at ${x}_{L}$ . Our $L$ subintervals of this region will be $[{x}_{0},{x}_{1}]$ , $({x}_{1},{x}_{2}]$ , ..., $({x}_{L-1},{x}_{L}]$ . To simplify our notation we will define $bin\left(k\right)$ to represent the interval $({x}_{k-1},{x}_{k}]$ , $k=1,2,\cdots ,L$ , and define the quantity $\Delta $ to be the length of each subinterval.
We will also define $\tilde{f}\left(k\right)$ to be the probability that $X$ falls into $bin\left(k\right)$ .
The approximation in [link] only holds for an appropriately small bin width $\Delta $ .
Next we introduce the concept of a histogram of a collection of i.i.d. random variables $\{{X}_{1},{X}_{2},\cdots ,{X}_{N}\}$ . Let us start by defining a function that will indicate whether ornot the random variable ${X}_{n}$ falls within $bin\left(k\right)$ .
The histogram of ${X}_{n}$ at $bin\left(k\right)$ , denoted as $H\left(k\right)$ , is simply the number of random variables that fall within $bin\left(k\right)$ . This can be written as
We can show that the normalized histogram, $H\left(k\right)/N$ , is an unbiased estimate of the probability of $X$ falling in $bin\left(k\right)$ . Let us compute the expected value of the normalized histogram.
The last equality results from the definition of $\tilde{f}\left(k\right)$ , and from the assumption that the ${X}_{n}$ 's have the same distribution. A similar argument may be used to show that the variance of $H\left(k\right)$ is given by
Therefore, as $N$ grows large, the bin probabilities $\tilde{f}\left(k\right)$ can be approximated by the normalized histogram $H\left(k\right)/N$ .
Using [link] , we may then approximate the density function ${f}_{X}\left(x\right)$ within $bin\left(k\right)$ by
Notice this estimate is a staircase function of $x$ which is constant over each interval $bin\left(k\right)$ . It can also easily be verified that this density estimate integrates to 1.
Let $U$ be a uniformly distributed random variable on the interval [0,1]with the following cumulative probability distribution, ${F}_{U}\left(u\right)$ :
We can calculate the cumulative probability distribution for the new random variable $X={U}^{\frac{1}{3}}$ .
Plot ${F}_{X}\left(x\right)$ for $x\in [0,1]$ . Also, analytically calculate the probability density ${f}_{X}\left(x\right)$ , and plot it for $x\in [0,1]$ .
Using $L=20$ , ${x}_{0}=0$ and ${x}_{L}=1$ , use Matlab to compute $\tilde{f}\left(k\right)$ , the probability of $X$ falling into $bin\left(k\right)$ .
stem
function.
stem
to plot
$\tilde{f}\left(k\right)$ , and put all three plots
on a single figure using
subplot
.Generate 1000 samples of a random variable
$U$ that is uniformly distributed between 0 and 1
(using the
rand
command).
Then form the random vector
$X$ by computing
$X={U}^{\frac{1}{3}}$ .
Use the Matlab function
hist
to plot a normalized
histogram for your samples of
$X$ , using 20 bins uniformly
spaced on the interval
$[0,1]$ .
H=hist(X,(0.5:19.5)/20)
to
obtain the histogram, and then normalize
H
.stem
command to plot the normalized histogram
$H\left(k\right)/N$ and
$\tilde{f}\left(k\right)$ together on the same figure using
subplot
.
Notification Switch
Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?