<< Chapter < Page Chapter >> Page >
This module breaks down a standard neural network, describing the different parameters, hyper-parameters, and functions that are necessary for building a neural network.

Cost functions

The different cost functions we explored using were the sigmoid function, rectified linear units (ReLU), and the softmax normalization function.

Sigmoid function

sigmoid
Equation for the sigmoid activation function

The sigmoid activation function is the most general nonlinear activation function used in neural networks. Intuition would naively suggest that the activation of a neuron would be well modeled by the step function, but the issue is its non-differentiability. The stochastic gradient descent algorithm requires that activation functions be differentiable. The solution would be to approximate the step function using a smooth function like the sigmoid or the hyperbolic tangent. The issue with the sigmoid function is that its derivative far from the origin is near zero, so if any individual weight on a neuron is very wrong, it is unable to use the gradient to adjust its value. As a result, outlier weights can significantly impact the performance of the network.

Rectified linear units

relu
Equation for the ReLU activation function

The advantage of using rectified linear units is threefold. First, its derivative is a constant (either 0 or 1) making the computation of the gradient much faster. Second, it is a better approximation of how biological neurons fire, in the sense that there is no activation in the absence of stimulation. Third, rectified linear units speed up learning by not being able to fire with zero net excitation. This means that if an excitation fails to overcome a neuron’s bias, the neuron will not fire at all. And when it does fire, the activation is linearly proportional to the excitation. The sigmoid function in comparison allows for some activation to occur with zero and even negative net excitation. However, a lower learning rate needs to be used with ReLU because its zero derivative for a net excitation less than zero means that the neuron effectively stops learning once its net excitation hits zero.

Softmax

softmax
Equation for the softmax activation function

Softmax activation is particularly useful on the output layer, as it normalizes the output. Exponentiating each of the net excitations gives a more dramatic representation of the differences between them. Weak excitations become weaker activations and strong excitations become stronger activations. Everything is then normalized, giving the layer the effect of becoming a decision-maker.

Activation functions

The different cost functions we explored using for the gradient descent learning algorithm were mean-squared error, cross-entropy, and log-likelihood.

Mean-squared error

mse
Equation for the mean-squared error cost function

Mean-squared error is the simplest measurement of difference that can be used to good effect in a neural network. It can be used with any activation function and is the more versatile option, though not always the most effective one. One of its shortcomings is that neurons with a sigmoid activation function become saturated quickly and are unable to learn more as a result of the relatively small magnitude of the sigmoid’s derivative far from the origin.

Cross-entropy

entropy
Equation for the cross-entropy cost function

Cross-entropy treats the desired output as some probability distribution and the network’s output as another probability distribution, and measures the distance between the distributions. The main attraction to using cross-entropy is that when used in conjunction with the sigmoid activation function, its gradient is linearly dependent on the error, solving the issue with neurons becoming saturated quickly.

Log-likelihood

log
Equation for the log-likelihood cost function

Log likelihood maximizes only the output neuron corresponding to which neuron should be firing. Used in conjunction with a softmax layer, all other output neurons would be minimized as a result of maximizing the desired output neuron. In this sense, a softmax layer has to be used, or the activations of the final layer will be too close together to draw meaningful conclusions.

Stochastic gradient descent

Stochastic gradient descent

sgd
Equation for the SGD learning algorithm, applied to both the weights and biases.

Stochastic gradient descent is the algorithm used in our network to adjust weights and biases according to the evaluation of the gradient of a given cost function. The gradient determines whether a parameter should increase or decrease and by how much. The learning rate of a network is a constant associated with how much a parameter should travel down its gradient at each reevaluation. In the original algorithm, parameters are updated after each given input. A common practice with neural nets is to only reevaluate the gradient after a so-called minibatch of multiple inputs is passed. This way, the cost function has multiple samples and can better construct a curve, yet the gradient is somewhat different every time it’s evaluated. This introduces some noise into the gradient to make it harder for parameters to get stuck in a local minimum of the gradient.

Dropout

Overfitting is an issue experienced in networks when neurons are trained to identify specific images in a training set rather than the more general concept that an image represents. Instead of recognizing a 7, the network may only recognize the particular 7s that were in the training data set. To prevent this, we implemented dropout in our network. Random neurons in our interconnected layers were turned off between mini-batches, meaning that certain weights were not able to be used in determining an output. This essentially means that we were training a slightly different network each mini-batch, encouraging more neurons to learn meaningfully, as weights will typically be more fairly distributed. In evaluating the network, all neurons are turned back on and their weights are scaled down by the dropout rate. As a result, neurons are less strongly associated with particular images, and more applicable to a more expansive set of images.

Questions & Answers

please l need past question about economics
Prosper Reply
ok let me know some of the questions please.
Effah
ok am not wit some if den nw buh by tommorow I shall get Dem
adepojurafiu
Hi guys can I get Adam Smith's WEALTH OF NATIONS fo sale?
Ukpen
Wat d meaning of management
igwe Reply
disaster management cycle
Gogul Reply
cooperate social responsibility
igwe
Fedric Wilson Taylor also define management as the act of knowing what to do and seeing that it is done in the best and cheapest way
OLANIYI
difference between microeconomics and macroeconomic
Ugyen Reply
microeconomics is the study of individual units, firm and government while macroeconomics is the study of the economic aggregates.
okhiria
The classical theory of full employment
Lovely
what is monopoli power
Adzaho Reply
the situation that prevails when economic forces balance so that economic variables neither increase nor decrease
Bombey
what is equilibrium
Kabir
what are the important of economic to accounting students with references
salihu Reply
Economics is important because it helps people understand how a variety of factors work with and against each other to control how resources such as labor and capital get used, and how inflation, supply, demand, interest rates and other factors determine how much you pay for goods and services.
Muhammad
explain the steps taken by the government in developing rural market?
Azeem Reply
contribution of Adam smith in economics
abel Reply
I will join
Dexter
I will join
Patrick
Hey
Fatima
Hey
Amir
Hello
AS
hey
Umarou
I love this book and i need extra Economic book
Amir
Hey
Amir
what's happening here
AS
I love this book and i need extra Economic book
Amir
what is the meaning of function in economics
Effah Reply
Pls, I need more explanation on price Elasticity of Supply
Isaac Reply
Is the degree to the degree of responsiveness of a change in quantity supplied of goods to a change in price
Afran
what is production
Humaira
Okay what is land mobile and land unmobile
scor
And what are the resources in land
scor
the proces of using the services of labor and equipmnt together with other in puts to make goods and services availble
Bombey
Okay what is land mobile and land unmobile
scor
Discuss the short-term and long-term balance positions of the firm in the monopoly market?
Rabindranath Reply
hey
Soumya
hi
Mitiku
how are you?
Mitiku
can you tell how can i economics honurs(BSC) in reputed college?
Soumya
through hard study and performing well than expected from you
Mitiku
what should i prepare for it?
Soumya
prepare first, in psychologically as well as potentially to sacrifice what's expected from you, when I say this I mean that you have to be ready, for every thing and to accept failure as a good and you need to change them to potential for achievement of ur goals
Mitiku
parna kya hai behencho?
Soumya
Hallo
Rabindranath
Hello, dear what's up?
Mitiku
cool
Momoh
good morning
Isaac
pls, is anyone here from Ghana?
Isaac
Hw s every one please
Afran
Ys please I'm in Ghana
Afran
Hello
OLANIYI
pls anyone from Nigeria
OLANIYI
am a new candidate here, can someone put me 2ru
OLANIYI
hello
OLANIYI
Pls economic A level exam tomorrow pls help me
akinwale
am from Ghana
Jacob
Pls economic A level exam tomorrow pls help me
akinwale
Hi
Dev
bol Diya discuss ab krega v
Dev
hello Mr. Rabindranath
Dev
what do you want Dimlare
Dev
yes tell me your desire to have it
Dev
to have what?
OLANIYI
Good luck
JOSEPH
I want to know about economic A level tomorrow pls help
Lerato
okay
Umarou
okay
Umarou
hi
Humaira
hi
Liaqat
what is firms
Anteyi Reply
A firm is a business entity which engages in the production of goods and aimed at making profit.
Avuwada
What is autarky in Economics.
Avuwada
what is choice
Tia Reply
So how is the perfect competition different from others
Rev Reply
what is choice
Tia
please what type of commodity is 1.Beaf 2.Suagr 3.Bread
Alfred Reply
1
Naziru
2
Mayor
While the American heart association suggests that meditation might be used in conjunction with more traditional treatments as a way to manage hypertension
Beverly Reply
Researchers demonstrated that the hippocampus functions in memory processing by creating lesions in the hippocampi of rats, which resulted in ________.
Mapo Reply
The formulation of new memories is sometimes called ________, and the process of bringing up old memories is called ________.
Mapo Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!





Source:  OpenStax, Elec 301 projects fall 2015. OpenStax CNX. Jan 04, 2016 Download for free at https://legacy.cnx.org/content/col11950/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 projects fall 2015' conversation and receive update notifications?

Ask