<< Chapter < Page | Chapter >> Page > |
The back propagation method is the example of the wide class of training methods based on the information covered in the gradient of error function. The independent variables in this minimization are weights of neural network and the considered error to be minimized is the root mean square one.
Let us consider the training set composed of $L$ ordered pairs, of the following form: $\left\{\left({x}^{\left(1\right)},{d}^{\left(1\right)}\right),\left({x}^{\left(2\right)},{d}^{\left(2\right)}\right)\mathrm{,...,}\left({x}^{\left(L\right)},{d}^{\left(L\right)}\right)\right\}$ Furthermore, let us define the total error $E$ generated on outputs of neural network after presenting the entire training set, as: $$E=\sum _{l=1}^{L}{E}^{\left(l\right)}$$ where: $${E}^{\left(l\right)}=\sum _{m=1}^{M}{E}_{m}^{\left(l\right)}=\frac{1}{2}\sum _{m=1}^{M}{\left({d}_{m}^{\left(l\right)}-{y}_{m}^{\left(l\right)}\right)}^{2}$$ As was already told, the independent variables in the minimization of error $E$ are weights ${w}_{\mathrm{ij}}$ Since even for the relatively small networks the number of weigths is big, in real applications, the training of the neural network is the minimization of the scalar field over the vector space with hundreds or (more often) thousands dimensions. One of the minizmiazation techniques for such problem is the steapest descent method
$$\int_{0}^{1} x^{2}\,d x$$
$$\sum_{n=1} $$∞
Notification Switch
Would you like to follow the 'Ece 301 projects fall 2003' conversation and receive update notifications?