Welcome to CS With James
In this tutorial I will discuss about Vanishing Gradient Problem and how it is solved by using ReLU Activation function.
Vanishing Gradient Problem is the problem occur when the Network become too deep and the result Become worse then shallow Network. It is happening because of the sigmoid function
The Neural Network is trained by backward propagation, but if the Network is too deep with the sigmoid activation function then the first several layers are not get trained.
This is visualization of sigmoid function
The problem is the sigmoid function squeeze the weights on the neuron between 0 and 1, so the input is hardly affecting the output of the Network.
That is why deeper Network result out worse then the shallow Network.
This is good Visualization of the Vanishing Gradient Problem
The first few layers are faded out so it is not affecting the result of the Network
The result is 57.93% accuracy which is way worse then the Network with the 3 layers.
ReLU Activation function. ReLU Stands for Rectified Linear Unit.
This is the visualization of the ReLU Activation Function.
The result gets much better
94.61% of accuracy which is not perfect but with the ReLU Activation Function the Deep Neural Network works.